Chapter 19: Bayesian Confirmation Theory

The dominant framework for understanding how evidence supports hypotheses in science, grounded in the probability calculus and Bayes’ theorem.

Bayesian confirmation theory is the most influential and widely adopted account of how evidence bears on scientific hypotheses. Its central claim is disarmingly simple: evidence $E$ confirms hypothesis $H$ if and only if learning $E$ raises the probability of $H$. Formally:

$$E \text{ confirms } H \iff P(H|E) > P(H)$$

This definition, combined with the machinery of Bayes’ theorem, yields a powerful and general theory of confirmation that can handle a remarkable range of problems — from the ravens paradox to the design of clinical trials. Bayesianism has become so dominant that it is sometimes called the “new orthodoxy” in confirmation theory. Yet it faces deep philosophical challenges that continue to drive research.

In this chapter, we develop the Bayesian framework from its foundations, examine its major achievements, and confront the most serious objections it faces. The goal is not merely to learn the formalism but to understand the philosophical assumptions on which it rests and the limits of what it can achieve.

Bayes’ Theorem and Its Components

The engine of Bayesian confirmation theory is Bayes’ theorem, a straightforward consequence of the axioms of probability:

$$P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}$$

Each component plays a distinct philosophical role:

  • $P(H)$ — The Prior Probability: The degree of belief in $H$ before the evidence $E$ is obtained. This is the most controversial element, since it introduces an apparently subjective element into confirmation.
  • $P(E|H)$ — The Likelihood: The probability of observing $E$ if $H$ is true. This is often the most straightforward component, especially when $H$ makes precise quantitative predictions.
  • $P(E)$ — The Marginal Likelihood (Evidence): The total probability of observing $E$, averaging over all possible hypotheses. This can be expanded as: $P(E) = P(E|H)P(H) + P(E|\neg H)P(\neg H)$.
  • $P(H|E)$ — The Posterior Probability: The updated degree of belief in $H$ after learning $E$. This is the output of the Bayesian updating process.
“Given some data, it’s trivial to calculate the probability of the data given any hypothesis. The problem is to go the other way: to figure out the probability of the hypothesis given the data. That’s what Bayes’ theorem does.”— Sharon Bertsch McGrayne, The Theory That Would Not Die (2011)

An alternative and sometimes more illuminating form expresses the theorem in terms of the likelihood ratioand prior odds:

$$\underbrace{\frac{P(H|E)}{P(\neg H|E)}}_{\text{Posterior Odds}} = \underbrace{\frac{P(E|H)}{P(E|\neg H)}}_{\text{Likelihood Ratio}} \times \underbrace{\frac{P(H)}{P(\neg H)}}_{\text{Prior Odds}}$$

This form makes clear that evidence confirms $H$ precisely when the likelihood ratio $P(E|H)/P(E|\neg H)$ is greater than 1 — that is, when the evidence is more probable under $H$ than under its negation.

Confirmation as Probability-Raising

The core Bayesian account identifies confirmation with probability-raising. Evidence $E$ confirms hypothesis $H$ relative to background knowledge $K$ if and only if:

$$P(H|E \wedge K) > P(H|K)$$

Equivalently, by Bayes’ theorem, $E$ confirms $H$ if and only if $P(E|H \wedge K) > P(E|K)$ — that is, if $H$ makes $E$ more probable than $E$ is on background knowledge alone. This gives us a natural relevance criterion: evidence confirms a hypothesis when the hypothesis makes that evidence more expected.

Several measures of degree of confirmation have been proposed:

Difference measure: $d(H,E) = P(H|E) - P(H)$

Ratio measure: $r(H,E) = \frac{P(H|E)}{P(H)}$

Likelihood ratio: $l(H,E) = \frac{P(E|H)}{P(E|\neg H)}$

Log-likelihood ratio: $L(H,E) = \log \frac{P(E|H)}{P(E|\neg H)}$

These measures do not always agree on which of two pieces of evidence provides stronger confirmation. The choice among them is itself a philosophical question with implications for how we understand the confirmatory process. Fitelson (1999) has argued extensively for the likelihood ratio measure on the grounds that it satisfies the most desirable formal constraints.

Prior Probabilities: Subjective vs Objective Bayesianism

The most philosophically contentious aspect of Bayesianism is the role of prior probabilities. Where do they come from? What constrains them? The answer to this question divides Bayesians into two broad camps.

Subjective Bayesianism

Subjective Bayesians, following de Finetti and Savage, hold that prior probabilities represent an agent’s personal degrees of belief. The only constraint on priors is coherence — they must satisfy the axioms of probability. Beyond that, any prior is permissible. The key insight is that coherent agents who update by conditionalization will tend to converge in their posterior probabilities as evidence accumulates.

“Probability does not exist.”— Bruno de Finetti, opening line of Theory of Probability (1974)

De Finetti’s provocative slogan encapsulates the subjectivist position: probabilities are not features of the external world but expressions of rational credence. The famous Dutch book argument provides a pragmatic justification: an agent whose degrees of belief violate the probability axioms can be shown to accept a set of bets that guarantees a net loss. Coherence is thus a requirement of practical rationality.

Objective Bayesianism

Objective Bayesians seek additional constraints on priors beyond mere coherence. The most influential approach appeals to the principle of maximum entropy (MaxEnt), developed by E.T. Jaynes. The idea is that, in the absence of specific information, one should adopt the prior that is maximally noncommittal — the one with the highest entropy:

$$H(\mathbf{p}) = -\sum_{i} p_i \ln p_i$$

Jon Williamson has developed a sophisticated version of objective Bayesianism that combines three norms: (1) probabilities should be calibrated to known frequencies; (2) they should satisfy the constraints imposed by evidence; and (3) subject to those constraints, entropy should be maximised.

The debate between subjective and objective Bayesianism mirrors deeper questions about the nature of scientific objectivity. If scientific reasoning rests on subjective priors, how can science claim objectivity? The subjectivist responds that objectivity lies not in the starting point but in the method: Bayesian conditionalization is a uniquely rational way to learn from experience, and differences in priors wash out as evidence accumulates.

The Problem of Old Evidence

Clark Glymour (1980) identified a serious challenge for Bayesian confirmation theory: the problem of old evidence. Consider Einstein’s general theory of relativity and the anomalous precession of Mercury’s perihelion. The precession was well known before Einstein developed his theory. Yet when Einstein showed that general relativity predicted the correct precession, this was widely regarded as strong confirmation of the theory.

The Bayesian has difficulty explaining this. If the evidence $E$ (Mercury’s precession) is already known, then $P(E) = 1$. But then, by Bayes’ theorem:

$$P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)} = \frac{1 \cdot P(H)}{1} = P(H)$$

If $P(E) = 1$ and $P(E|H) = 1$ (since the theory entails the known evidence), the posterior equals the prior — no confirmation occurs. Yet intuitively, the ability of general relativity to account for the precession of Mercury was a reason to believe the theory.

Several responses have been offered. Garber (1983) proposed that what gets confirmed is not $H$simpliciter but the logical proposition that $H$ entails $E$. Learning that your theory has a previously unrecognised consequence does raise its probability. Howson (1991) argued that the Bayesian should counterfactually evaluate what the agent’s probability function would have beenhad they not known $E$, and use that hypothetical prior to assess confirmation.

The problem of old evidence reveals a fundamental tension in the Bayesian framework between its formal apparatus and the historical dynamics of actual scientific reasoning.

Convergence Theorems: Washing Out the Priors

A crucial defence of Bayesianism against the charge of subjectivity appeals to convergence theorems. These mathematical results show that, under certain conditions, Bayesian agents who start with different priors will converge toward the same posterior probabilities as they accumulate evidence.

The most important convergence result is the merging of opinions theorem (Blackwell and Dubins, 1962). It states that if two probability measures $P_1$ and $P_2$ are absolutely continuous with respect to each other (meaning they agree on which events have probability zero), then their conditional probabilities converge almost surely as evidence accumulates:

$$\|P_1(\cdot | E_1, \ldots, E_n) - P_2(\cdot | E_1, \ldots, E_n)\| \to 0 \text{ as } n \to \infty$$

This result is philosophically significant because it suggests that the choice of prior does not matter in the long run — the evidence will eventually swamp any reasonable prior. However, critics have noted several limitations:

  • The condition of absolute continuity is non-trivial: agents who assign probability zero to the true state of the world will never converge.
  • Convergence is guaranteed only in the limit — it says nothing about the rate of convergence, which may be very slow.
  • In realistic cases with finitely many observations, priors can make a decisive difference to the posterior.
“In the long run we are all dead.”— John Maynard Keynes, A Tract on Monetary Reform (1923)

Keynes’s quip, though made in an economic context, captures the force of the objection. The promise of long-run convergence may be cold comfort when we need to make decisions now on the basis of finite evidence.

The Ravens Paradox and Bayesian Resolution

Hempel’s ravens paradox (1945) is one of the most famous puzzles in confirmation theory. Consider the hypothesis:

$$H: \text{All ravens are black} \quad \equiv \quad \forall x (Rx \to Bx)$$

By the equivalence condition, $H$ is logically equivalent to $\forall x (\neg Bx \to \neg Rx)$ — “all non-black things are non-ravens.” By Nicod’s condition, a hypothesis of the form “All $F$s are $G$s” is confirmed by instances of $F$s that are $G$s. Together, these conditions imply that a white shoe confirms “All ravens are black” — a deeply counterintuitive result.

The Bayesian resolution, developed by Hosiasson-Lindenbaum (1940) and elaborated by Howson and Urbach (2006), is elegant. The Bayesian accepts that a white shoe confirms the hypothesis — but argues that the degree of confirmation is negligibly small. The key insight is that the likelihood ratio $P(E|H)/P(E|\neg H)$ for observing a white shoe is barely above 1, because white shoes are overwhelmingly likely regardless of whether all ravens are black.

By contrast, observing a black raven provides substantial confirmation because the proportion of ravens in the total population of objects is tiny. The Bayesian analysis shows that the degree of confirmation from a positive instance depends on the relative sizes of the reference classes — a result that accords well with intuition and vindicates the practice of confirming generalisations by examining instances of the antecedent.

“The Bayesian solution to the ravens paradox is one of the great success stories of confirmation theory.”— Colin Howson and Peter Urbach, Scientific Reasoning: The Bayesian Approach (2006)

The Grue Paradox and Projectibility

Nelson Goodman’s “new riddle of induction” (1955) poses a deeper challenge. Define a new predicate “grue”:

$$\text{grue}(x) \iff (\text{examined before } t \wedge \text{green}(x)) \vee (\text{not examined before } t \wedge \text{blue}(x))$$

All emeralds examined before time $t$ are both green and grue. So the evidence equally supports “All emeralds are green” and “All emeralds are grue.” Yet these hypotheses make incompatible predictions about emeralds examined after $t$. The problem is to explain why “green” is projectible (suitable for inductive generalization) and “grue” is not.

The Bayesian response is that the distinction lies in the prior probabilities. A rational agent should assign a lower prior to “All emeralds are grue” than to “All emeralds are green,” because the grue hypothesis posits an unexplained change at time $t$. But critics object that this merely relocates the problem: why should we assign lower priors to grue-like hypotheses? The Bayesian framework does not itself answer this question — it requires a substantive account of what makes some predicates natural and projectible.

“The problem of induction is not the problem of justifying inductive inference. It is the problem of defining ‘confirmation’ in a way that is not trivial.”— Nelson Goodman, Fact, Fiction, and Forecast (1955)

Goodman’s own solution appeals to entrenchment: projectible predicates are those that have a long history of successful use in inductive inferences. “Green” is entrenched; “grue” is not. This is a pragmatic, historical criterion rather than a logical one — a feature that some philosophers find satisfying and others find deeply troubling.

Bayesian vs Frequentist Approaches

The Bayesian approach to confirmation stands in sharp contrast to frequentist methods, which dominated 20th-century statistics and remain widely used in science. The philosophical differences are profound:

Bayesian

  • Probability = degree of belief
  • Can assign probabilities to hypotheses
  • Uses prior information explicitly
  • Answers: How probable is $H$ given $E$?
  • Updating via conditionalization

Frequentist

  • Probability = long-run frequency
  • Cannot assign probabilities to hypotheses
  • Uses only the data and sampling distribution
  • Answers: How probable is $E$ given $H$?
  • Error rates of procedures

The frequentist objects that $P(H)$ is meaningless: a hypothesis is either true or false; it does not have a probability. The Bayesian responds that probability is a measure of rational credence, not a physical frequency, and that science requires us to assess the plausibility of hypotheses in light of evidence.

The debate has practical consequences. Frequentist methods can reject a true hypothesis or fail to reject a false one in ways that seem irrational from a Bayesian perspective. Bayesian methods can be sensitive to the choice of prior in ways that seem arbitrary from a frequentist perspective. Chapter 21 examines these issues in the context of actual scientific practice.

Critical Assessment

Bayesian confirmation theory has earned its dominant position through genuine explanatory power. It provides a unified framework that handles diverse confirmation puzzles, accommodates background knowledge naturally, and connects to rational decision theory through the expected utility framework. Its formal elegance is unmatched by any rival account.

Yet several serious objections persist:

  • The problem of logical omniscience: Bayesian agents are assumed to know all logical consequences of their beliefs — an idealisation that no real agent satisfies.
  • Computational intractability: In realistic cases, calculating posteriors over complex hypothesis spaces is computationally infeasible without strong simplifying assumptions.
  • The catch-all hypothesis: The denominator $P(E)$ requires summing over all possible hypotheses, including ones not yet conceived. This is the “catch-all” problem (Shimony).
  • Value-laden priors: In practice, the choice of priors can encode social and political values, raising concerns about the claimed objectivity of Bayesian reasoning.

Despite these challenges, Bayesian confirmation theory remains the most developed, most precise, and most widely endorsed philosophical account of the evidence-hypothesis relationship. Understanding its strengths and limitations is essential for any serious engagement with the epistemology of science.

← Part VII OverviewChapter 20: The Problem of Induction →