Part II: Inference | Chapter 2

Limit Theorems

Laws of large numbers, central limit theorem, delta method, and concentration inequalities

Historical Context

The law of large numbers was first proved by Jacob Bernoulli in 1713 for the special case of Bernoulli trials; the general weak law was established by Chebyshev in 1867 and refined by Khintchine in 1929. The strong law was proved by Kolmogorov in 1933. The central limit theorem has a rich history: de Moivre (1733) and Laplace (1812) proved it for specific cases, Lyapunov gave sufficient conditions in 1901, and the definitive Lindeberg-Feller theorem appeared in 1920–1922. The delta method traces back to the early 20th century, while the modern theory of large deviations was pioneered by Harald Cramér (1938) and greatly extended by S. R. S. Varadhan, who received the Abel Prize in 2007 for his contributions.

2.1 Modes of Convergence

Let $X_1, X_2, \ldots$ be a sequence of random variables and $X$ a target random variable. There are several notions of convergence:

Almost Sure Convergence

$$X_n \xrightarrow{\text{a.s.}} X \iff P\!\left(\lim_{n \to \infty} X_n = X\right) = 1$$

Convergence in Probability

$$X_n \xrightarrow{P} X \iff \forall \epsilon > 0, \; P(|X_n - X| > \epsilon) \to 0$$

Convergence in Distribution

$$X_n \xrightarrow{d} X \iff F_{X_n}(x) \to F_X(x) \text{ at all continuity points of } F_X$$

The implications are: a.s. convergence $\Rightarrow$ convergence in probability $\Rightarrow$ convergence in distribution. Convergence in $L^p$ (i.e., $E[|X_n - X|^p] \to 0$) implies convergence in probability. None of the reverse implications hold in general, except that convergence in distribution to a constant implies convergence in probability.

2.2 Law of Large Numbers

Weak Law (WLLN)

Let $X_1, X_2, \ldots$ be i.i.d. with $E[X_i] = \mu$ and$\text{Var}(X_i) = \sigma^2 < \infty$. Then

$$\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{P} \mu$$

Proof sketch (Chebyshev): $\text{Var}(\bar{X}_n) = \sigma^2/n$, so by Chebyshev, $P(|\bar{X}_n - \mu| > \epsilon) \le \sigma^2 / (n\epsilon^2) \to 0$.

Strong Law (SLLN)

If $X_1, X_2, \ldots$ are i.i.d. with $E[|X_1|] < \infty$, then

$$\bar{X}_n \xrightarrow{\text{a.s.}} \mu$$

The SLLN requires only a finite first moment, not a finite variance. The proof uses the Borel–Cantelli lemma and truncation arguments. The distinction matters: under the SLLN, the sample mean converges for every sample path (except a set of probability zero), not merely in a probabilistic sense.

2.3 Central Limit Theorem

Classical CLT

Let $X_1, \ldots, X_n$ be i.i.d. with mean $\mu$ and variance $\sigma^2 \in (0, \infty)$. Then

$$\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} N(0, 1)$$

Equivalently, $\bar{X}_n \approx N(\mu, \sigma^2/n)$ for large $n$.

Lindeberg–Feller CLT

For independent (not necessarily identically distributed) $X_1, \ldots, X_n$with $E[X_i] = \mu_i$, $\text{Var}(X_i) = \sigma_i^2$, and$s_n^2 = \sum_{i=1}^n \sigma_i^2$, the CLT holds provided the Lindeberg condition is satisfied:

$$\forall \epsilon > 0: \quad \frac{1}{s_n^2} \sum_{i=1}^n E\!\left[(X_i - \mu_i)^2 \mathbf{1}_{|X_i - \mu_i| > \epsilon s_n}\right] \to 0$$

The Berry–Esseen theorem quantifies the rate of convergence:$\sup_x |F_n(x) - \Phi(x)| \le \frac{C \rho}{\sigma^3 \sqrt{n}}$where $\rho = E[|X_1 - \mu|^3]$ and $C \le 0.4748$.

2.4 The Delta Method

First-Order Delta Method

If $\sqrt{n}(T_n - \theta) \xrightarrow{d} N(0, \sigma^2)$ and$g$ is continuously differentiable with $g'(\theta) \ne 0$, then

$$\sqrt{n}(g(T_n) - g(\theta)) \xrightarrow{d} N(0, \sigma^2 [g'(\theta)]^2)$$

Second-Order Delta Method

When $g'(\theta) = 0$ but $g''(\theta) \ne 0$:

$$n(g(T_n) - g(\theta)) \xrightarrow{d} \frac{\sigma^2 g''(\theta)}{2} \chi^2_1$$

This is useful for variance-stabilizing transformations. For example, if $X \sim \text{Poisson}(\lambda)$, then $\sqrt{X}$ has approximately constant variance $1/4$ for large $\lambda$.

Multivariate extension: If $\sqrt{n}(\mathbf{T}_n - \boldsymbol{\theta}) \xrightarrow{d} N(\mathbf{0}, \Sigma)$and $g: \mathbb{R}^k \to \mathbb{R}$ is differentiable, then$\sqrt{n}(g(\mathbf{T}_n) - g(\boldsymbol{\theta})) \xrightarrow{d} N(0, \nabla g(\boldsymbol{\theta})^T \Sigma \nabla g(\boldsymbol{\theta}))$.

2.5 Large Deviations and Concentration

Hoeffding's Inequality

Let $X_1, \ldots, X_n$ be independent with $a_i \le X_i \le b_i$a.s. Then for $t > 0$:

$$P\!\left(\bar{X}_n - \mu \ge t\right) \le \exp\!\left(-\frac{2n^2 t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)$$

For $X_i \in [0, 1]$ this simplifies to$P(|\bar{X}_n - \mu| \ge t) \le 2e^{-2nt^2}$. This provides anexponential tail bound, far tighter than Chebyshev for bounded variables.

Cramér's Large Deviation Principle

For i.i.d. $X_i$ with MGF $M(t)$ finite in a neighborhood of 0, the rate function is the Legendre transform of $\log M(t)$:

$$I(x) = \sup_{t \in \mathbb{R}} \left\{ tx - \log M(t) \right\}$$

Then $P(\bar{X}_n \ge a) \approx e^{-nI(a)}$ for $a > \mu$. Large deviations theory provides precise asymptotics for rare-event probabilities that the CLT cannot capture.

Computational Laboratory

We visualize the LLN convergence, the CLT in action for various source distributions, the Berry–Esseen bound, and Hoeffding's exponential concentration.

LLN, CLT, Berry-Esseen & Hoeffding Concentration

Python
script.py98 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Rate this chapter: