
When scientists cannot confirm the results from a published study, is it an indication of a problem or is it a natural part of the scientific process that can lead to new discoveries? A Consensus Study Report on “Reproducibility and Replicability in Science” by the National Academies of Science, Engineering and Medicine, published last week, define what it means to reproduce or replicate a study, and explore the implications and impact of these issues on the public’s trust in science (see figure above).
- Reproducibility is obtaining consistent results using the same input data, computational steps, methods, and code, and conditions of analysis. This definition is synonymous with “computational reproducibility” and the terms are used interchangeably in this report.
- Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data. Two studies may be considered to have replicated if they obtain consistent results given the level of uncertainty inherent in the system under study.
- Generalizability refers to the extent that results of a study apply in other contexts or populations that differ from the original one.
Reproducibility involves the original data and code while replicability involves new data collection to test for consistency with previous results of a similar study. The definition of reproducibility focuses on computation because of its large and increasing role in scientific research.
The report put forward a good number of conclusions and recommendations. Among them (a brief non-exhaustive “mindthepostly” summary):
The NSF endorses transparency and openness through the creation of code and data repositories for long-term preservation of digital artefacts, and further research and development of open-source, usable tools and infrastructure that support reproducibility.
Statistical inference has an outsized role in replicability discussions due to the frequent misuse of statistics such as the p-value and threshold for determining “statistical significance.” All institutions managing scientific work should include training in the proper use of statistical analysis and inference. Researchers who use statistical inference analyses should learn to use them properly.
A predominant focus on the replicability of individual studies is an inefficient way to assure the reliability of scientific knowledge. Rather, reviews of cumulative evidence on a subject, to assess both the overall effect size and generalizability, is often a more useful way to gain confidence in the state of scientific knowledge.
Journalists should report on scientific results with as much context and nuance as the medium allows. (Oh, yes please. Can anyone explain them how to properly quote, refer and link sources, instead of self-indulging in click bait?)
Anyone making personal or policy decisions based on scientific evidence should be wary of making a serious decision based on the results, no matter how promising, of a single study.
Well, this is what we think it is true today, but… don’t take it for granted
Sleeper, 1973 by Woody Allen set 200 years in the future. (Dialogue quoted in the Report.)