← Back to Part I: What Is Science?

Chapter 2: The Scientific Method

Is there a single method that defines science?

“The scientific method” is perhaps the most commonly invoked phrase in popular discussions of science. Textbooks routinely present a neat sequence — observe, hypothesize, predict, test, conclude — as if this were the universal recipe for scientific knowledge. But is there really a single method that all sciences share? Philosophers of science have debated this question intensely, and the answer is far from straightforward.

This chapter examines the major candidates for “the” scientific method, from the hypothetico-deductive model to inference to the best explanation and Bayesian reasoning. We then confront Paul Feyerabend’s radical challenge: that there is no fixed method, and that the history of science shows that “anything goes.” Finally, we consider how method varies across the sciences and what the contemporary replication crisis tells us about scientific methodology.

2.1 The Hypothetico-Deductive Method

The hypothetico-deductive (HD) method is the most widely discussed model of scientific reasoning. It was developed in various forms by William Whewell, William Stanley Jevons, Karl Popper, and Carl Hempel. The basic structure is:

  1. Formulate a hypothesis H that might explain the phenomenon of interest.
  2. Derive a prediction P from H (possibly in conjunction with auxiliary assumptions A).
  3. Test the prediction by observation or experiment.
  4. If P is observed, H is confirmed (to some degree). If not-P is observed, H is disconfirmed (or the auxiliaries are blamed).

The HD method captures much of what scientists actually do: they propose theories, derive testable consequences, and check those consequences against the world. Its logical structure is clear and rigorous. However, it faces several well-known problems:

  • The tacking problem: If H entails P, then (H & Q) also entails P, for any arbitrary Q. So confirming P also confirms (H & “the moon is made of cheese”). This means the HD method confirms too much.
  • The problem of alternative hypotheses: Many different hypotheses may entail the same prediction. Confirming P does not tell us which hypothesis is correct. This is the underdetermination problem.
  • No account of discovery: The HD method tells us how to test hypotheses, but it says nothing about where hypotheses come from. Popper frankly admitted that the “context of discovery” was a matter of psychology, not logic.
  • The Duhem-Quine problem: Predictions are derived from H plus auxiliary assumptions A. If P fails, we can always save H by modifying A. So the HD method alone cannot determine which part of the theory is at fault.
“The method of science is the method of bold conjectures and ingenious and severe attempts to refute them.”— Karl Popper, Objective Knowledge (1972)

2.2 Inference to the Best Explanation (Abduction)

Inference to the Best Explanation (IBE), also called abduction (following C.S. Peirce), is an alternative model of scientific reasoning. The basic idea, articulated by Gilbert Harman (1965) and developed by Peter Lipton (2004), is:

Given a set of data D and a set of candidate explanations {H1, H2, ..., Hn}, we should infer the hypothesis that would, if true, provide the best explanation of D.

But what makes an explanation “best”? Lipton distinguished between the likeliest explanation (the one most probable given the evidence) and the loveliest explanation (the one that would, if true, provide the deepest understanding). Scientists, Lipton argued, characteristically pursue loveliness rather than mere likelihood.

Criteria for explanatory goodness typically include:

Unification

Explains diverse phenomena with a single mechanism

Simplicity

Invokes fewer independent assumptions

Precision

Makes specific, quantitative predictions

Fertility

Opens new lines of research

Consistency

Coheres with established knowledge

Mechanism

Specifies a causal process

“The inference to the best explanation is the basis of most scientific reasoning and indeed of most reasoning in everyday life.”— Gilbert Harman, “The Inference to the Best Explanation” (1965)

IBE has been criticized by Bas van Fraassen, who argues that the best available explanation may still be a bad explanation. The fact that a hypothesis explains the data better than its competitors does not mean it is true — the true hypothesis may not have been considered. Van Fraassen calls this the “argument from a bad lot.” Others (including Larry Laudan) have argued that the history of science is littered with explanatorily superior theories that turned out to be false.

2.3 The Role of Experiments, Controls, and Replication

Experimentation is often seen as the hallmark of science. The controlled experiment — in which a single variable is manipulated while all others are held constant — is a powerful tool for establishing causal relationships. But the philosophical analysis of experimentation reveals surprising complexities.

The Logic of Controlled Experiments

The ideal controlled experiment involves a treatment group(exposed to the variable of interest) and a control group(identical in all respects except the variable). Any observed difference can then be attributed to the variable. Randomized controlled trials (RCTs) use random assignment to ensure that known and unknown confounders are distributed equally between groups.

However, Ian Hacking and others have argued that experimentation is not merely a method of testing theories. Experiments have a “life of their own” — they can reveal new phenomena, create new entities (e.g., Bose-Einstein condensates), and generate data that outrun existing theoretical frameworks. As Hacking put it: “Experimentation has many lives of its own.”

Replication

Replication is widely regarded as a cornerstone of scientific methodology. If a finding is genuine, it should be reproducible by independent researchers. But what counts as a “replication”? Philosophers distinguish between:

  • Direct replication: Repeating the same experiment with the same methods and materials.
  • Conceptual replication: Testing the same hypothesis with different methods or materials.
  • Systematic replication: Varying conditions systematically to determine the boundary conditions of an effect.

Harry Collins has argued that replication is never straightforward. What counts as a “competent” replication is itself a matter of judgment, and disputes about failed replications often turn on questions about tacit knowledge and experimental skill. Collins calls this the “experimenter’s regress”: the quality of an experiment is judged by whether it gets the “right” result, but the right result is what a good experiment reveals.

2.4 Feyerabend’s “Against Method”

Paul Feyerabend (1924–1994) mounted the most radical challenge to the idea of a scientific method. In Against Method (1975), he argued that every methodological rule that has been proposed — “rely on observation,” “reject falsified theories,” “use controlled experiments” — has been violated at some point in the history of science, and that the violation was essential for scientific progress.

“The only principle that does not inhibit progress is: anything goes.”— Paul Feyerabend, Against Method (1975), p. 23

Feyerabend’s central case study is Galileo’s defense of the Copernican heliocentric model. Feyerabend argued that Galileo succeeded not by following any method but by breaking every rule in the book:

  • He contradicted observation: The Copernican model predicted stellar parallax, which was not observed (it was too small to detect with existing instruments). Galileo simply dismissed this inconvenient fact.
  • He used ad hoc hypotheses: Galileo’s theory of the tides (offered as evidence for the Earth’s motion) was incorrect.
  • He relied on propaganda: Galileo used literary skill, ridicule, and political maneuvering as much as argument and evidence.
  • He used untested instruments: The telescope had not been independently calibrated for astronomical use. Galileo assumed it was reliable without proof.

Feyerabend’s position is often called epistemological anarchism. He did not advocate irrationality; rather, he argued that rationality cannot be captured in a fixed set of rules. Different problems require different approaches, and the attempt to impose a single method on all of science is both historically inaccurate and potentially harmful to scientific progress.

Feyerabend also argued that the methodological monopoly of science has political consequences. In Science in a Free Society (1978), he suggested that citizens in a democratic society should have the right to choose between scientific and non-scientific approaches (e.g., in medicine), just as they choose between political parties. This made him deeply unpopular among fellow philosophers of science, who accused him of irresponsible relativism.

“The world which we want to explore is a largely unknown entity. We must, therefore, keep our options open and we must not restrict ourselves in advance.”— Paul Feyerabend, Against Method (1975)

2.5 Method in Different Sciences

Even if we reject Feyerabend’s extreme position, there are genuine differences in methodology across the sciences. The idealized picture of controlled experimentation fits some sciences much better than others.

SciencePrimary MethodsDistinctive Challenges
PhysicsControlled experiments, mathematical modeling, thought experimentsExtreme conditions; reliance on expensive apparatus
BiologyField observation, lab experiments, comparative method, phylogenetic analysisComplexity; historical contingency; ethical constraints
Geology / PaleontologyFieldwork, stratigraphic analysis, inference from tracesNon-repeatable events; deep time; incomplete record
PsychologyExperiments, surveys, case studies, neuroimagingSubjectivity; demand characteristics; WEIRD samples
EconomicsEconometric modeling, natural experiments, agent-based simulationReflexivity; inability to perform controlled experiments on whole economies
CosmologyObservational astronomy, computer simulation, model comparisonSingle-case problem; no experimental control; observer selection effects

These differences raise important philosophical questions. Is physics the model that all sciences should aspire to (“physics envy”)? Or are the methods of biology, psychology, and the social sciences legitimate in their own right? The debate between methodological monism(there is one scientific method) and methodological pluralism (different sciences require different methods) remains active.

“The physicist can, at best, give the philosopher only a partial solution to his problem: for his methods are of no use in biology or psychology. The biologist needs different methods, and the psychologist still others.”— Ernst Mach, The Science of Mechanics (1883)

2.6 Bayesian Reasoning in Science

In recent decades, Bayesian confirmation theory has emerged as a powerful framework for understanding scientific reasoning. On the Bayesian view, scientific reasoning is a matter of updating probabilities in light of evidence, using Bayes’ theorem:

\(P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}\)

The posterior probability of a hypothesis given the evidence equals the likelihood of the evidence given the hypothesis, times the prior probability of the hypothesis, divided by the total probability of the evidence.

Bayesianism has several virtues as a model of scientific reasoning. It provides a precise, quantitative account of confirmation. It naturally handles probabilistic hypotheses. It explains why diverse evidence is more confirming than repetitive evidence. And it unifies the HD method and IBE as special cases.

However, Bayesianism faces its own challenges. The choice of prior probabilities is often regarded as subjective, raising concerns about objectivity. The computational demands of Bayesian updating are enormous for realistic scientific problems. And some philosophers (e.g., Deborah Mayo) argue that Bayesian reasoning cannot capture the logic of severe testing that is central to experimental science.

2.7 The Reproducibility Crisis

Since 2011, science has been rocked by the “replication crisis” (or “reproducibility crisis”). Large-scale replication projects have found that a substantial proportion of published results cannot be reproduced:

  • Psychology: The Open Science Collaboration (2015) attempted to replicate 100 studies published in top psychology journals. Only 36% yielded statistically significant results on replication.
  • Cancer biology: The Reproducibility Project: Cancer Biology found that only a fraction of high-profile preclinical studies could be replicated.
  • Economics: Camerer et al. (2016) replicated 18 experimental economics studies; 61% replicated.

The crisis has been attributed to several methodological factors:

P-hacking

Analyzing data in multiple ways until a statistically significant result appears

Publication Bias

Journals preferentially publish positive results; negative results go unreported

Low Statistical Power

Small sample sizes that inflate effect sizes and produce spurious results

HARKing

Hypothesizing After Results are Known — presenting post-hoc findings as predictions

The crisis raises deep philosophical questions. Does it show that “the scientific method” has failed? Or does it show that science is self-correcting — that the crisis was discovered and is being addressed by scientists themselves? Reforms such as pre-registration of studies, registered reports, and open data practices represent methodological improvements that would not have occurred without the crisis.

“The first principle is that you must not fool yourself — and you are the easiest person to fool.”— Richard Feynman, “Cargo Cult Science” (1974)

2.8 Assessment: Is There a Scientific Method?

The philosophical consensus, insofar as there is one, is that there is no single “scientific method” in the sense of a fixed algorithm that scientists follow. But this does not mean that anything goes. Several important insights have emerged from the debate:

  • Methodological pluralism: Different sciences, and different problems within the same science, call for different methods. The unity of science lies not in a shared method but in shared epistemic values (accuracy, consistency, scope, simplicity, fruitfulness).
  • Overlapping strategies: While there is no single method, there are overlapping strategies — hypothesis testing, controlled experimentation, statistical analysis, peer review, replication — that characterize science as a whole.
  • Method as evolving: Scientific methods are not fixed but evolve over time. The development of the randomized controlled trial, the double-blind experiment, and Bayesian statistics represent genuine methodological progress.
  • The social dimension: Scientific methods are not just individual cognitive procedures but social practices embedded in institutions (peer review, funding agencies, professional societies) that shape what counts as good science.

Key Takeaways

  1. The hypothetico-deductive method captures important features of scientific reasoning but faces the tacking problem, the underdetermination problem, and the Duhem-Quine problem.
  2. Inference to the Best Explanation offers a richer account of theory choice but faces the “argument from a bad lot.”
  3. Bayesian reasoning provides a precise quantitative framework but raises concerns about subjective priors.
  4. Feyerabend’s “anything goes” is a provocation, not a positive thesis; it is best understood as a warning against methodological dogmatism.
  5. Different sciences use different methods; methodological pluralism is the norm, not the exception.
  6. The replication crisis reveals genuine methodological problems but also demonstrates science’s capacity for self-correction.
  7. Scientific methods are not fixed but evolve; they are social practices as much as logical procedures.