Limit Theorems
Laws of large numbers, central limit theorem, delta method, and concentration inequalities
Historical Context
The law of large numbers was first proved by Jacob Bernoulli in 1713 for the special case of Bernoulli trials; the general weak law was established by Chebyshev in 1867 and refined by Khintchine in 1929. The strong law was proved by Kolmogorov in 1933. The central limit theorem has a rich history: de Moivre (1733) and Laplace (1812) proved it for specific cases, Lyapunov gave sufficient conditions in 1901, and the definitive Lindeberg-Feller theorem appeared in 1920–1922. The delta method traces back to the early 20th century, while the modern theory of large deviations was pioneered by Harald Cramér (1938) and greatly extended by S. R. S. Varadhan, who received the Abel Prize in 2007 for his contributions.
2.1 Modes of Convergence
Let $X_1, X_2, \ldots$ be a sequence of random variables and $X$ a target random variable. There are several notions of convergence:
Almost Sure Convergence
Convergence in Probability
Convergence in Distribution
The implications are: a.s. convergence $\Rightarrow$ convergence in probability $\Rightarrow$ convergence in distribution. Convergence in $L^p$ (i.e., $E[|X_n - X|^p] \to 0$) implies convergence in probability. None of the reverse implications hold in general, except that convergence in distribution to a constant implies convergence in probability.
2.2 Law of Large Numbers
Weak Law (WLLN)
Let $X_1, X_2, \ldots$ be i.i.d. with $E[X_i] = \mu$ and$\text{Var}(X_i) = \sigma^2 < \infty$. Then
Proof sketch (Chebyshev): $\text{Var}(\bar{X}_n) = \sigma^2/n$, so by Chebyshev, $P(|\bar{X}_n - \mu| > \epsilon) \le \sigma^2 / (n\epsilon^2) \to 0$.
Strong Law (SLLN)
If $X_1, X_2, \ldots$ are i.i.d. with $E[|X_1|] < \infty$, then
The SLLN requires only a finite first moment, not a finite variance. The proof uses the Borel–Cantelli lemma and truncation arguments. The distinction matters: under the SLLN, the sample mean converges for every sample path (except a set of probability zero), not merely in a probabilistic sense.
2.3 Central Limit Theorem
Classical CLT
Let $X_1, \ldots, X_n$ be i.i.d. with mean $\mu$ and variance $\sigma^2 \in (0, \infty)$. Then
Equivalently, $\bar{X}_n \approx N(\mu, \sigma^2/n)$ for large $n$.
Lindeberg–Feller CLT
For independent (not necessarily identically distributed) $X_1, \ldots, X_n$with $E[X_i] = \mu_i$, $\text{Var}(X_i) = \sigma_i^2$, and$s_n^2 = \sum_{i=1}^n \sigma_i^2$, the CLT holds provided the Lindeberg condition is satisfied:
The Berry–Esseen theorem quantifies the rate of convergence:$\sup_x |F_n(x) - \Phi(x)| \le \frac{C \rho}{\sigma^3 \sqrt{n}}$where $\rho = E[|X_1 - \mu|^3]$ and $C \le 0.4748$.
2.4 The Delta Method
First-Order Delta Method
If $\sqrt{n}(T_n - \theta) \xrightarrow{d} N(0, \sigma^2)$ and$g$ is continuously differentiable with $g'(\theta) \ne 0$, then
Second-Order Delta Method
When $g'(\theta) = 0$ but $g''(\theta) \ne 0$:
This is useful for variance-stabilizing transformations. For example, if $X \sim \text{Poisson}(\lambda)$, then $\sqrt{X}$ has approximately constant variance $1/4$ for large $\lambda$.
Multivariate extension: If $\sqrt{n}(\mathbf{T}_n - \boldsymbol{\theta}) \xrightarrow{d} N(\mathbf{0}, \Sigma)$and $g: \mathbb{R}^k \to \mathbb{R}$ is differentiable, then$\sqrt{n}(g(\mathbf{T}_n) - g(\boldsymbol{\theta})) \xrightarrow{d} N(0, \nabla g(\boldsymbol{\theta})^T \Sigma \nabla g(\boldsymbol{\theta}))$.
2.5 Large Deviations and Concentration
Hoeffding's Inequality
Let $X_1, \ldots, X_n$ be independent with $a_i \le X_i \le b_i$a.s. Then for $t > 0$:
For $X_i \in [0, 1]$ this simplifies to$P(|\bar{X}_n - \mu| \ge t) \le 2e^{-2nt^2}$. This provides anexponential tail bound, far tighter than Chebyshev for bounded variables.
Cramér's Large Deviation Principle
For i.i.d. $X_i$ with MGF $M(t)$ finite in a neighborhood of 0, the rate function is the Legendre transform of $\log M(t)$:
Then $P(\bar{X}_n \ge a) \approx e^{-nI(a)}$ for $a > \mu$. Large deviations theory provides precise asymptotics for rare-event probabilities that the CLT cannot capture.
Computational Laboratory
We visualize the LLN convergence, the CLT in action for various source distributions, the Berry–Esseen bound, and Hoeffding's exponential concentration.
LLN, CLT, Berry-Esseen & Hoeffding Concentration
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server