
In Week 1, we broke down the idea of open research into three key facets that make research open: transparency, integrity and accessibility. We’re now going to take a deeper dive into integrity: how trustworthy a study is.
This week, you will learn how to recognise and avoid questionable research practices. You will discover why it is often important to be able to replicate other researchers’ findings, and how to go about doing this. You will experience an important test for the integrity of a piece of research: its replicability.
There may be cases where you don’t expect to get the same results if you conduct the same study again. For example, if a study is based around a specific political event, it may be difficult or even impossible to replicate. But in many other types of investigation, we would expect to be able to get the same results when we run the same study again.
In Week 2, we talked about reproducibility – being able to get the same results when conducting the same analyses on the same data as the original study. Replicability is similar, in that it’s about getting the same results as the original study when running the same analyses, however, the difference is that now these analyses are run on new data. So, replication means conducting the same study again, and seeing if you get the same results.
Replication studies are deliberate attempts to do this. But what does the ‘same study’ mean? There are always going to be differences between the original study and the replication study. Replication studies vary on a spectrum from ‘direct’ to ‘conceptual’. Direct replications try to stay as close to the original study as possible, whereas conceptual replications purposefully vary some aspects to better understand the underlying phenomenon. Here are some examples, from most direct (the first) to most conceptual (the last):
Now let’s dig deeper into the process of designing a replication study.
In the next video, psychologist Priya Silverstein talks about their first forays into conducting a replication study, and lessons learned. As you watch the video, think about what Priya’s results tell us about the process of running a good replication study.

Hi everyone, my name’s Priya Silverstein and I'm a post-doctoral researcher for the Psychological Science Accelerator, and I'm also the author for this course. My pronouns are ‘they, them’.
As part of my PhD, I ran my first replication study. It wasn't meant to be a big part of my PhD, but it ended up being one of the biggest parts!
I thought that before starting any of my own original research, it would make more sense to start with a replication study. However it wasn’t that simple, so when I ran the replication study, surprisingly, we got a null result, and I was a bit confused about why this might be. So the first thing that I did was I contacted the original researchers to ask them what they thought might be the problem.
They got back to me and they said that they thought it was because of some differences between the stimuli, so the things that I'd shown in my study versus the things that they showed in the original. And some of these differences were things I couldn't have known, because they didn't outline the specifics of that in their original paper.
I made some edits to the protocol to the way that I was going to run the study.
And then I thought, okay, now that I've had approval from the original authors this new version should be able to replicate the original study. So I ran it again and surprisingly I still wasn't able to replicate the result.
Erm and so this was quite disappointing, both for me and the original authors, because it meant that I wasn't able to find the same thing that they did.
So... This was my first experience of replications. And you might think that that was enough to put me off doing any more, but instead, quite the opposite. I ended up realising how important replications are.
So yeah, ever since starting with that first replication study as part of my PhD, I've now kind of made that my specialty.
My advice for anyone who would be conducting their own replication study comes from some of the mistakes that I made as part of that first replication study that I did.
So my first piece of advice would be to always contact the authors before you begin your replication study.
I think I was a bit naïve, and thought if I just follow what's written in the paper then how can I go wrong? But papers don't have enough space to include everything about a study that you would need to know in order to conduct a good replication.
So I'd recommend talking to the original authors, coming to an agreement with them, making sure that they agree that the protocol that you've proposed, they would agree that's a good faith replication attempt of their study.
Another thing that I did wrong is that I only collected the same number of participants as in the original study, for my replication, because I thought that was more ‘replication-y’, because it was the same amount of participants as the original. But now, after learning more about both replication studies, but also sample size more generally, I would really recommend to go with a much larger sample than the original study that you're replicating.
And this is just so that you can be a little bit more sure about what your findings mean. So, in my study, I wasn't able to replicate the same result as the original authors, but this could just be because the true effect size that's in the world for that effect that I was looking at might be smaller than what they kind of measured in the original study.
If I had used a much larger number of participants, if I still wasn't able to replicate the study, we could be a bit more sure that it wasn't just because of low sample size.
I ran my study just as a normal study where we finished the entire study and then submitted it to a journal for publication. And I was lucky that it was successful in getting published.
But it could have been a lot harder for me to publish, which would have been a bit disappointing and taken a lot of time. So what I would recommend instead is submitting any replication study as a registered report.
A registered report is essentially where your paper gets peer-reviewed before you have collected data. So the peer reviewers say whether your protocol makes sense, recommend any suggested changes, and then once they've accepted it, the journal agrees to accept your study, regardless of what the outcome is. So that would be my third piece of advice.
Use this box to write your comments on what Priya advises.
Allow about 10 minutes for this.
When you are ready, press 'reveal' to see our comments.
There are fields and methodologies where the value of replication is hotly debated. For instance:
It is important to think carefully about whether replication makes sense for your field and methodology.
If you are working in a field where replication is important, and if your study replicates the one you are trying to replicate, you can be pretty confident about the result.
But what does it mean if, like Priya’s first attempts, your study does not replicate? One explanation could be that the original result was a ‘false positive’, and so the failed replication is a ‘true negative’. Another explanation is that the replication result was a ‘false negative’, and that the original study was a ‘true positive’. It’s also possible that differences between the two studies are responsible for the different results.
Allow about 20 minutes
This activity relates to our examples of typical direct and conceptual replication studies. By way of reminder:
Now imagine these two researchers both carry out their studies. List the reasons why each of these two researchers may not replicate the original result.
When you are ready, press 'reveal' to see our comments.
You might have listed:
This section will highlight some of the issues around replication in quantitative research. Replication is possible in qualitative research, and many qualitative researchers see the value of replication. So if you are a qualitative researcher, this section is still relevant to you. It will allow you to explore key issues faced by quantitative colleagues, learn how to read quantitative research papers more critically, and think about whether these issues could also apply to qualitative research, albeit manifested differently.
If we consider relatively direct replications, using the same materials as the original authors but conducted by different researchers, what percentage of published results do you imagine would replicate? It would be tempting to think that most published research findings are true, and therefore that a replication of a published research finding would be pretty likely to find the same result. However, researchers have found these percentages of findings could not be replicated:
The number of studies that could not be replicated was much higher than expected in certain fields, which has led some to refer to this as a ‘replication crisis'.
Why is it that so many quantitative studies cannot be replicated? It’s complicated!
Previously, you learned the three classifications of failed replications: the original result was a false positive, the replication result was a false negative, or differences between the two studies could have been responsible for the different results. However, these three interpretations are not all as likely as each other. There are ways to try and work out which of these are most likely.
The original result being a false positive is more likely than you would think. Researchers often do not publish all the research that they do. As a researcher, there is an incentive to publish papers in ‘high impact’ journals (journals that are regarded highly in the researcher’s discipline, and that publish papers that receive a high number of citations). Historically, it has been harder to publish negative (null) results than positive (statistically significant) results, as journals have prioritised headline-grabbing results that confirm popular or contemporary positions. This has been the case for all journals, but especially high-impact ones.
This means that researchers have an incentive to get positive results in their research and can feel disappointed, stressed, and even ashamed if they don’t get a significant result. This can entice them to turn to questionable research practices, to increase the likelihood of a false positive result.
Yes, you read the end of the previous section correctly. There are questionable research practices that researchers may feel pressurised to use. Here are some examples:
Although pressures to publish can sometimes be seen as barriers to transparency, the benefits of writing transparently can also be seen as a positive incentive, as the next section shows.
When writing manuscripts, researchers should aim to be as transparent as possible, being honest about what happened in the study, how it was conducted, and when and why decisions were made. By using questionable research practices, researchers make it more likely that they get a false positive result, which can partially explain low replicability rates.
In the video, Priya introduced another important consideration for evaluating replication results: sample size (the number of samples in your study, e.g. participants). Smaller sample sizes make it more likely to get both a false positive and a false negative result. This is because smaller sample sizes provide less information about the population you are studying, which increases the variability and uncertainty in your results. With a small sample, the random variation (or 'noise') can more easily overshadow the true effect you are trying to measure. This means you might detect an effect that isn’t really there, a false positive, or miss an effect that actually exists, a false negative.
For instance, imagine trying to judge the average height of a population by looking at just a few individuals. Your estimate is more likely to be off compared to measuring a larger group, because you may happen to have either a very tall or very short person in your sample. So, if you have an original study with a small sample size and a (well-designed) replication with a large sample size, you could be more confident in the result of the replication than the result of the original study.
What not to do!
Allow about 30 minutes
So far, you have considered good and bad writing practices. With these in mind, have a go at this ‘hack your way to scientific glory’ activity. First, choose a political party: Republican [UK equivalent: Conservative] or Democrat [UK equivalent: Labour]. Then predict whether the party has a positive or negative impact on the economy. When you have done that, change aspects of the research (e.g. participant inclusion criteria and how you’re measuring your dependent variable) and see whether you can find a significant result (p < 0.05) in your predicted direction.
The reason this is an example of ‘what not to do’ is because when you first choose a political party and predict whether they will have a positive or negative impact on the economy, you are forming a hypothesis. But, if you then play around with the data until you get the result that you wanted, and only stop when you do, then you are fixing the result.
The activity involves various questionable research practices, such as p-hacking, HARK-ing, and selective reporting. However, there is a way to do different analyses on the same data without any of these being a problem. If instead of deciding on a hypotheses first then confirming it, you were to conduct purely exploratory research (without a hypothesis) you could be transparent about all of the different ways you looked at the data and how the results differed when you tried different things. This could even lead to people conducting their own future studies to confirm your exploratory results!
When reading an academic paper, it’s important to read with a critical mindset and feel free to disagree with the methodological or analysis strategy, the interpretation of the results, or the conclusions drawn. Although we know that there are rare instances of outright fraud in science, we would expect that the researchers are truthfully describing what happened in the study, how it was conducted, and when and why decisions were made.
You have learned that replication studies vary on a spectrum from ‘direct’ to ‘conceptual’. However, most replication studies have some differences from the original study, even if these weren’t intentional. Consider one of the examples from before, where a researcher was replicating a paper from the 1990s. The materials they create will be different from the original materials, and if what they’re studying is context-dependent, a lot might have changed since then.
For example, a study on internet usage habits conducted in the 1990s would yield very different results if replicated today, due to the dramatic changes in technology and how people use the internet. Similarly, a study examining public attitudes toward mental health in the 1990s might produce different findings now because societal awareness and acceptance of mental health issues have evolved significantly over the past few decades.
For this reason, some consider that most replication studies are actually generalisability studies. Generalisability means whether a particular result generalises beyond the specific participants and conditions of the study to broader groups of samples, settings, methods, or measures. For example, if we’re interested in public attitudes to mental health, it wouldn’t make sense for us to only ask people aged 50-60, or only men, or only those living in cities. It’s possible that any of these characteristics could affect people’s opinions on mental health, meaning the results would be biased and not representative of the full population.
Without generalisability studies, it might be possible that the theoretical explanation for why the finding occurred might be incorrect. For example, there could even be a mistake in the design of the study that biased the results. For instance, imagine a biological study investigating the effects of a new drug using a specific strain of lab mice. If this particular strain has a unique genetic mutation that makes it respond differently to the drug compared to other strains, the study’s results might not generalize to other mice or to humans. This could lead to an incorrect conclusion about the drug’s overall effectiveness and safety.
Researchers wishing to be transparent when writing their papers should declare possible ‘Constraints on generality’ in the discussion section. This could take the form of a statement that identifies and justifies the target populations for the reported findings, and other considerations the authors think would be necessary for replicating their result. This could help other researchers to sample from the same populations when conducting a direct replication, or to test the boundaries of generalisability when conducting a conceptual replication.
So you think your research has potential do good in the world, but don’t know how widely it can be applied? There are lots of different ways to study generalisability:
Allow about 10 minutes
Think about when a study in your field would or wouldn’t generalise, and make a few notes as to why this might be the case.
When you are ready, press 'reveal' to see our comments.
There are lots of reasons why a study may or may not generalise. Imagine a study evaluating a new therapy for depression in a university clinic with primarily urban-based participants. While the therapy showed significant improvement in depressive symptoms over ten weeks among a diverse sample, including college students and middle-aged adults of various ethnicities recruited through local health centres and university channels, its applicability to other populations and settings may be limited. Factors such as regional differences in mental health resources, demographic diversity beyond the studied age groups, and recruitment biases could affect the therapy's effectiveness in rural or suburban areas and among older adults or adolescents.

Throughout the course, we offer you self-test quizzes to help you test your understanding of the course concepts. These quizzes are there to help you consolidate your knowledge. This one tackles key ideas in replication and the principle of generalisability. It is important to answer the questions carefully, then read the feedback, whether you got the answer right or not.
Answer the questions on the following pages:
This week you learned about an important aspect of integrity: replicability. Replicability relates to whether or not a study ‘replicates’, i.e. whether or not, when you repeat the study with new data, you get the same result. You learned some reasons why replicability may be low in many fields, and how differences between studies may sometimes contribute to this. You also learned about the importance of generalisability in research. Next week, you’ll learn techniques which can support both the integrity and the transparency of your research.
Camerer, C, Dreber, A, Forsell, E, Ho, T, Huber, J, Johannesson, M, Kirchler, M, Almenberg, J, Altmejd, A, Chan, T, Heikensten, E, Holzmeister, F, Imai, T, Isaksson, S, Nave, G, Pfeiffer, T, Razen, M, Wu, H (2016): Evaluating replicability of laboratory experiments in Economics. Science, 3 Mar 2016, Vol 351, Issue 6280, pp. 1433-1436.
Available at: https://doi.org/ 10.1126/ science.aaf0918
Cova, F, Strickland, B, Abatista, A, (2021): Estimating the reproducibility of experimental philosophy. Review of Philosophy and Psychology 12(1):9-44
Available at: https://www.researchgate.net/ publication/ 325216701_Estimating_the_Reproducibility_of_Experimental_Philosophy
Ebersole CR, Mathur MB, Baranski E, et al (2020): Many labs 5: Testing pre-data-collection peer review as an intervention to increase replicability. Advances in Methods and Practices in Psychological Science. 2020;3(3):309-33
Available at: https://doi.org/ 10.1177/ 2515245920958687
Ebersole C. R., Atherton O. E., Belanger A. L., Skulborstad H. M., Allen J. M., Banks J. B., … Nosek B. A. (2016): Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82.
Available at: https://doi.org/ 10.1016/ j.jesp.2015.10.012
Errington, T, Mathur, M, Soderberg, C, Denis, A, Perfito, N, Iorns, E, Nosek, B, (2021): Investigating the replicability of preclinical cancer biology, eLife.
Available at: https://doi.org/ 10.7554/ eLife.71601
Fifethirtyeight.com: Hack your way to scientific glory (website)
Available at: https://projects.fivethirtyeight.com/ p-hacking/
FORRT (2024): Lesson plan 8: open data and qualitative research (lesson template with a CC-By Attribution 4.0 licence).
Available at: https://osf.io/ nyfqx
Hofstede, G. (1980): Culture's consequences: international differences in work-related values. Beverly Hills, CA: Sage Publications.
Available at: https://books.google.co.uk/ books/ about/ Culture_s_Consequences.html?id=Cayp_Um4O9gC&redir_esc=y
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Jr., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., . . . Nosek, B. A. (2014). Investigating variation in replicability: A ‘many labs’ replication project.Social Psychology, 45(3), 142–152.
Available at: https://doi.org/ 10.1027/ 1864-9335/ a000178
ManyGoats (2025): https://www.themanygoatsproject.com/
Open Science Collaboration (2015): Estimating the reproducibility of psychological science Science, 349 aac4716.
Available at: https://doi.org/ 10.1126/ science.aac4716
Silverstein, P. (2020): Evaluating the replicability and specificity of evidence for natural pedagogy theory
Available at: https://www.research.lancs.ac.uk/ portal/ en/ publications/ evaluating-the-replicability-and-specificity-of-evidence-for-natural-pedagogy-theory(39b30b8b-7701-45b9-9009-d2d43bd5a006).html
Silverstein, P., Gliga, T., Westermann, G., Parise, E.: Probing communication induced biases in preverbal infants: two replication attempts of Yoon, Johnson and CsibraInfant Behaviour and Development, 55, 77-87.
Available at: https://www.sciencedirect.com/ science/ article/ pii/ S0163638318301474?via%3Dihub
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017): Constraints on Generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123-1128.
Available at: https://doi.org/ 10.1177/ 1745691617708630
UNICEF Innocenti (2022): Evidence and Gap Map Research Briefs: UNICEF Strategic Plan 2018–2021 Goal Areas
Available at: https://www.unicef.org/ innocenti/ reports/ evidence-and-gap-map-research-briefs