Will we ever be able to predict future scientific discoveries?

The desire to predict discoveries pervades modern science. The more predictable we can make the process of scientific discovery, the more efficiently resources will be used to support technological, biomedical, and scientific advances. However, our present understanding of how discoveries emerge is limited, and relatively few predictions by individuals, publishers, funding agencies, or hiring committees are made in a scientific way.

The statistical occurrence of innovations shows striking regularities that represent a starting point to get a deeper insight in the whole phenomenology(1). The widespread availability of bibliographic databases and online platforms—Google Scholar, PubMed, JSTOR and others—are enabling a new generation of researchers to develop deeper insights into the scientific process. These efforts raise a provocative question: Will we eventually be able to predict important discoveries or their discoverers?

This is the question that Aaron Clauset, Daniel B. Larremore, and Roberta Sinatra formulate in an essay(2) published in Science 3 Feb 2017. Their analysis and discussion is a piece in the “science of science” worth reading. As you may guess, the answer is not simple, and there is a dark side.

The history of scientific discovery provides evidence of both, discoveries which can be reasonably anticipated as theory and evidence accumulate—like the observation of gravitational waves, the structure of DNA, or the decoding of the human genome—and others—like antibiotics, programmable gene editing, and cosmic microwave background radiation—which seem impossible to predict. They represent puzzle pieces that change how we think the puzzle is organised or that find new uses in underdeveloped parts of the puzzle. (Surely, those introducing paradigm shifts(4).)

expected-unexpectedThe authors explain how science’s social fabric determines the preferential attachment mechanism which explains why citations are distributed so unevenly across papers, and why some receive hundreds or even thousands of times more attention than the typical paper. This model also makes remarkably good predictions for how citations accumulate within a developing field. However, some discoveries do not follow these rules. For instance, there are papers far exceeding the predictions made by simple preferential attachment. And then, there are the sleeping beauties, and the funerals

Researchers have investigated the predictability of individual scientists’ performance and achievements over the course of a career. The conventional narrative(3) is that, after being hired, a researcher’s productivity tends to rise rapidly to an early peak and then gradually declines (see figure below). It was also widely believed that the early to middle years of a career are more likely to produce a scientist’s “personal best” discovery (i.e., most well cited result).

productivity-trajectory-parametersMore recent studies show that there is no correlation between the impact of a discovery and its timing within a scientist’s career. In fact, the personal best is more likely to occur in the more productive phases of a scientist’s career, which may or may not coincide with early years. It is also interesting to learn that grant proposals led by female or non-white investigators, or those focused on interdisciplinary research, are less likely to receive funding.

Citations and publications are measures of past success that exhibit a strong feedback loop. When combined with the hyper-competitive nature of modern scientific publishing, funding, and hiring, they can create dramatic inequalities in apparent success.

We have a responsibility to ensure that the use of prediction tools does not inhibit future discovery, marginalize underrepresented groups, exclude novel ideas, or discourage interdisciplinary work and the development of new fields.

Relying too heavily on those measures may have unintended consequences, much in the vein of a self-fulfilling prophecy (and similar to the filter bubble what we are seeing in other areas invaded by a reckless use of big data).

This widespread emphasis on predictable discoveries over unexpected ones breeds a different, more risk-averse scientist.

In other words, to what extent the belief that we can “wisely” anticipate discoveries will end up conditioning or constraining what we actually discover. Is it happening?

A troubling trend, however, is the nearly annual declaration by a Nobel laureate that their biggest discovery would not have been possible in today’s research environment. The 2016 declaration came from Ohsumi, who decried the fact that “scientists are now increasingly required to provide evidence of immediate and tangible application of their work”

To conclude, a humbling insight:

A more reliable engine for generating scientific discoveries may be to cultivate and maintain a healthy ecosystem of scientists rather than focus on predicting individual discoveries.


(1) Loreto, Vittorio, Vito D. P. Servedio, Steven H. Strogatz, and Francesca Tria. 2017. ‘Dynamics on Expanding Spaces: Modeling the Emergence of Novelties’, January. doi:10.1007/978-3-319-24403-7_5.

(2) Clauset, Aaron, Daniel B. Larremore, and Roberta Sinatra. 2017. ‘Data-Driven Predictions in the Science of Science’. Science 355 (6324): 477–80. doi:10.1126/science.aal4217.

(3) Way, Samuel F., Allison C. Morgan, Aaron Clauset, and Daniel B. Larremore. 2016. ‘The Misleading Narrative of the Canonical Faculty Productivity Trajectory’. arXiv:1612.08228 [Physics], December. http://arxiv.org/abs/1612.08228.

(4) Kuhn, Thomas S. 1996. The Structure of Scientific Revolutions, 3rd Edition. 3rd edition. Chicago, IL: The University of Chicago Press.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.