Reproducibility in quantitative research
Open data is key to understanding one of the big concerns in quantitative research: reproducibility. Assessing reproducibility means assessing the value or accuracy of a scientific claim based on the original methods, data, and code. So, when you run the same analyses on the same data, do you get the same results?
Running the same analyses on the same data can mean different things depending on what materials the reproducer has access to. Investigating the reproducibility of a study can mean taking the original data and:
- Following the description of analyses in the paper.
- Following an analysis plan created by the original authors.
- Re-running the analysis code that has been shared with the data.
As you can imagine, it’s easier to get the same results as the original researchers if there is less uncertainty around what they did. So, re-running the analysis code will be more likely to produce the same results than following the description of analyses in the paper. Going back to our baking analogy, it would usually be easier to produce the same cake as a professional chef if they shared the recipe they used than if they just described what they did, and the more detail they provided in the recipe, the easier it would be. However, even if a professional chef shared both a detailed recipe and a description of what they did, your cake might end up with a soggy bottom! Similarly, in research, when we have both the code and the data, it can still be difficult to reproduce results.
Here are a few tips for making it more likely that others will be able to reproduce the results of a study:
- Share a data dictionary – list all the variables in your dataset, what they mean, how they were manipulated, and how they’re structured
- Annotate your code – make notes of what you did at each stage of the data pre-processing and analysis and why
- Make a note of software versions – analyses might stop working with future versions
- Make sure your data and code are suitably licensed – for example, a CC-BY-NC 4.0 [Tip: hold Ctrl and click a link to open it in a new tab. (Hide tip)] license means that anyone can share or adapt the material as long as they give you appropriate credit and do not use the materials for commercial purposes.
Activity 3:
Allow about 10 minutes
This activity will allow you to test your understanding of reproducibility.
Where to share data
