Beyond Experiments

Couples In Chemistry Beakers

In engineering and the physical sciences, experiments are a primary component of the scientific method. In medicine and the social sciences, their role is more controversial and their prevalence vary across disciplines.

In a paper published last February1, the authors explain that scholars have begun to question the strong emphasis on experiments. An overemphasis on experiments can blind researchers to their many shortcomings, and they are overused to the detriment of scientific progress.

It is often claimed that only experiments can support strong causal inferences and therefore they should be privileged in the behavioral sciences. We disagree. Overvaluing experiments results in their overuse both by researchers and decision makers and in an underappreciation of their shortcomings. Neglect of other methods often follows. Experiments can suggest whether X causes Y in a specific experimental setting; however, they often fail to elucidate either the mechanisms responsible for an effect or the strength of an effect in everyday natural settings. In this article, we consider two overarching issues. First, experiments have important limitations. We highlight problems with external, construct, statistical-conclusion, and internal validity; replicability; and conceptual issues associated with simple X causes Y thinking. Second, quasi-experimental and nonexperimental methods are absolutely essential. As well as themselves estimating causal effects, these other methods can provide information and understanding that goes beyond that provided by experiments. A research program progresses best when experiments are not treated as privileged but instead are combined with these other methods.

Diener, Ed, Robert Northcott, Michael J. Zyphur, and Stephen G. West. ‘Beyond Experiments’. Perspectives on Psychological Science, 24 February 2022, 17456916211037670.

To demonstrate and understand a causal connection is not a discrete yes-no event. It is instead a process of accumulating various types of evidence that complement one another. To this end, these are their recommendations:

  1. Wording matters. Descriptions of experimental outcomes should be worded carefully with the qualifications clearly stated.
  2. All methods are based on assumptions. When assumptions cannot be tested explicitly, that must be acknowledged.
  3. Research programs in the human sciences must use multiple methods.
  4. Researchers must see experiments as only one method of causal inference among many.
  5. External validity and construct validity should be considered from the start.
  6. Experimental manipulations need to be validated to establish construct validity.
  7. Where possible, researchers should conduct conceptual replications in which the putative theoretical independent variable is manipulated in several different ways and the theoretical dependent variable is measured in several different ways.
  8. To discover underlying mechanisms and structures, usually non-experimental methods will be helpful and superior to experiments.

One the authors quoted in the study who has extensively studied the theory of causal and counterfactual inference, Judea Pearl, illustrates the role of experiments with his image of the ladder of causation in The Book of Why.

Judea Pearl, The Book of Why. The Ladder of Causation, with representative organisms at each level. Most animals as well as present-day learning machines are on the first rung, learning from association. Tool users, such as early humans, are on the second rung, if they act by planning and not merely by imitation. We can also use experiments to learn the effects of interventions, and presumably this is how babies acquire much of their causal knowledge. On the top rung, counterfactual learners can imagine worlds that do not exist and infer reasons for observed phenomena.

(…) my research on machine learning has taught me that there are at least three distinct levels that need to be conquered by a causal learner: seeing, doing, and imagining.

The second ability, doing, stands for predicting the effect(s) of deliberate alterations of the environment, and choosing among these alterations to produce a desired outcome. Only a small handful of species have demonstrated elements of this skill. Usage of tools, provided they are designed for a purpose and not just picked up by accident or copied from one’s ancestors, could be taken as a sign of reaching this second level. Yet even tool users do not necessarily possess a “theory” of their tool that tells them why their tool works and what to do when it doesn’t. For that, you need to be at a level of understanding that permits imagining.

The trend in recent philosophy of science and economics is towards a balanced view. Experiments have strengths but also substantial weaknesses… And one thing is clear:

If funding and prestige are directed primarily to areas in which experiments can be conducted easily, the inevitable result will be a biased agenda, unhealthily distorting what kind of science is done

Do not allow science policy (bureaucracy) to cut the wings of imagination.


(1) Ed Diener, nicknamed Dr. Happiness for his fundamental research on the subject, with more than 280.000 citations, passed away one year ago, last April 2021. Remembering Ed Diener

Featured Image: Couples In Chemistry Beakers. Vintage illustration of couples walking inside chemistry beakers in front of a chemical processing plant, 1952. Screen print. (Illustration by GraphicaArtis/Getty Images)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.