Part II: Inference | Chapter 2

Limit Theorems

Laws of large numbers, central limit theorem, delta method, and concentration inequalities

Historical Context

The law of large numbers was first proved by Jacob Bernoulli in 1713 for the special case of Bernoulli trials; the general weak law was established by Chebyshev in 1867 and refined by Khintchine in 1929. The strong law was proved by Kolmogorov in 1933. The central limit theorem has a rich history: de Moivre (1733) and Laplace (1812) proved it for specific cases, Lyapunov gave sufficient conditions in 1901, and the definitive Lindeberg-Feller theorem appeared in 1920–1922. The delta method traces back to the early 20th century, while the modern theory of large deviations was pioneered by Harald Cramér (1938) and greatly extended by S. R. S. Varadhan, who received the Abel Prize in 2007 for his contributions.

2.1 Modes of Convergence

Let $X_1, X_2, \ldots$ be a sequence of random variables and $X$ a target random variable. There are several notions of convergence:

Almost Sure Convergence

$$X_n \xrightarrow{\text{a.s.}} X \iff P\!\left(\lim_{n \to \infty} X_n = X\right) = 1$$

Convergence in Probability

$$X_n \xrightarrow{P} X \iff \forall \epsilon > 0, \; P(|X_n - X| > \epsilon) \to 0$$

Convergence in Distribution

$$X_n \xrightarrow{d} X \iff F_{X_n}(x) \to F_X(x) \text{ at all continuity points of } F_X$$

The implications are: a.s. convergence $\Rightarrow$ convergence in probability $\Rightarrow$ convergence in distribution. Convergence in $L^p$ (i.e., $E[|X_n - X|^p] \to 0$) implies convergence in probability. None of the reverse implications hold in general, except that convergence in distribution to a constant implies convergence in probability.

2.2 Law of Large Numbers

Weak Law (WLLN)

Let $X_1, X_2, \ldots$ be i.i.d. with $E[X_i] = \mu$ and$\text{Var}(X_i) = \sigma^2 < \infty$. Then

$$\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{P} \mu$$

Proof sketch (Chebyshev): $\text{Var}(\bar{X}_n) = \sigma^2/n$, so by Chebyshev, $P(|\bar{X}_n - \mu| > \epsilon) \le \sigma^2 / (n\epsilon^2) \to 0$.

Strong Law (SLLN)

If $X_1, X_2, \ldots$ are i.i.d. with $E[|X_1|] < \infty$, then

$$\bar{X}_n \xrightarrow{\text{a.s.}} \mu$$

The SLLN requires only a finite first moment, not a finite variance. The proof uses the Borel–Cantelli lemma and truncation arguments. The distinction matters: under the SLLN, the sample mean converges for every sample path (except a set of probability zero), not merely in a probabilistic sense.

2.3 Central Limit Theorem

Classical CLT

Let $X_1, \ldots, X_n$ be i.i.d. with mean $\mu$ and variance $\sigma^2 \in (0, \infty)$. Then

$$\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} N(0, 1)$$

Equivalently, $\bar{X}_n \approx N(\mu, \sigma^2/n)$ for large $n$.

Lindeberg–Feller CLT

For independent (not necessarily identically distributed) $X_1, \ldots, X_n$with $E[X_i] = \mu_i$, $\text{Var}(X_i) = \sigma_i^2$, and$s_n^2 = \sum_{i=1}^n \sigma_i^2$, the CLT holds provided the Lindeberg condition is satisfied:

$$\forall \epsilon > 0: \quad \frac{1}{s_n^2} \sum_{i=1}^n E\!\left[(X_i - \mu_i)^2 \mathbf{1}_{|X_i - \mu_i| > \epsilon s_n}\right] \to 0$$

The Berry–Esseen theorem quantifies the rate of convergence:$\sup_x |F_n(x) - \Phi(x)| \le \frac{C \rho}{\sigma^3 \sqrt{n}}$where $\rho = E[|X_1 - \mu|^3]$ and $C \le 0.4748$.

2.4 The Delta Method

First-Order Delta Method

If $\sqrt{n}(T_n - \theta) \xrightarrow{d} N(0, \sigma^2)$ and$g$ is continuously differentiable with $g'(\theta) \ne 0$, then

$$\sqrt{n}(g(T_n) - g(\theta)) \xrightarrow{d} N(0, \sigma^2 [g'(\theta)]^2)$$

Second-Order Delta Method

When $g'(\theta) = 0$ but $g''(\theta) \ne 0$:

$$n(g(T_n) - g(\theta)) \xrightarrow{d} \frac{\sigma^2 g''(\theta)}{2} \chi^2_1$$

This is useful for variance-stabilizing transformations. For example, if $X \sim \text{Poisson}(\lambda)$, then $\sqrt{X}$ has approximately constant variance $1/4$ for large $\lambda$.

Multivariate extension: If $\sqrt{n}(\mathbf{T}_n - \boldsymbol{\theta}) \xrightarrow{d} N(\mathbf{0}, \Sigma)$and $g: \mathbb{R}^k \to \mathbb{R}$ is differentiable, then$\sqrt{n}(g(\mathbf{T}_n) - g(\boldsymbol{\theta})) \xrightarrow{d} N(0, \nabla g(\boldsymbol{\theta})^T \Sigma \nabla g(\boldsymbol{\theta}))$.

2.5 Large Deviations and Concentration

Hoeffding's Inequality

Let $X_1, \ldots, X_n$ be independent with $a_i \le X_i \le b_i$a.s. Then for $t > 0$:

$$P\!\left(\bar{X}_n - \mu \ge t\right) \le \exp\!\left(-\frac{2n^2 t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)$$

For $X_i \in [0, 1]$ this simplifies to$P(|\bar{X}_n - \mu| \ge t) \le 2e^{-2nt^2}$. This provides anexponential tail bound, far tighter than Chebyshev for bounded variables.

Cramér's Large Deviation Principle

For i.i.d. $X_i$ with MGF $M(t)$ finite in a neighborhood of 0, the rate function is the Legendre transform of $\log M(t)$:

$$I(x) = \sup_{t \in \mathbb{R}} \left\{ tx - \log M(t) \right\}$$

Then $P(\bar{X}_n \ge a) \approx e^{-nI(a)}$ for $a > \mu$. Large deviations theory provides precise asymptotics for rare-event probabilities that the CLT cannot capture.

Computational Laboratory

We visualize the LLN convergence, the CLT in action for various source distributions, the Berry–Esseen bound, and Hoeffding's exponential concentration.

LLN, CLT, Berry-Esseen & Hoeffding Concentration

Python

script.py98 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.patch.set_facecolor('#0f172a')
for ax in axes.flat:
    ax.set_facecolor('#0f172a')
    ax.tick_params(colors='#d1d5db')
    ax.xaxis.label.set_color('#d1d5db')
    ax.yaxis.label.set_color('#d1d5db')
    for spine in ax.spines.values():
        spine.set_color('#10b981')

# --- Panel 1: LLN Convergence ---
ax1 = axes[0, 0]
N = 10000
samples = np.random.exponential(2, size=N)
running_mean = np.cumsum(samples) / np.arange(1, N + 1)
ax1.plot(running_mean, color='#10b981', linewidth=1.5, label='Running mean')
ax1.axhline(2.0, color='#f87171', linestyle='--', linewidth=2, label='True mean = 2')
ax1.set_xscale('log')
ax1.set_title('LLN: Running Mean of Exp(2)', color='#10b981', fontsize=13, fontweight='bold')
ax1.set_xlabel('n (log scale)')
ax1.set_ylabel('Sample mean')
ax1.legend(fontsize=9, facecolor='#1e293b', edgecolor='#10b981', labelcolor='#d1d5db')

# --- Panel 2: CLT for different distributions ---
ax2 = axes[0, 1]
ns = [2, 5, 30, 100]
colors_list = ['#10b981', '#059669', '#34d399', '#6ee7b7']
x_grid = np.linspace(-4, 4, 300)
for n, c in zip(ns, colors_list):
    means = []
    for _ in range(5000):
        s = np.random.exponential(1, size=n)
        z = np.sqrt(n) * (np.mean(s) - 1) / 1.0
        means.append(z)
    ax2.hist(means, bins=50, density=True, alpha=0.4, color=c, label=f'n={n}')
ax2.plot(x_grid, stats.norm.pdf(x_grid), color='#f87171', linewidth=2, linestyle='--', label='N(0,1)')
ax2.set_title('CLT: Standardized Means (Exp source)', color='#10b981', fontsize=13, fontweight='bold')
ax2.set_xlabel('Standardized mean')
ax2.legend(fontsize=8, facecolor='#1e293b', edgecolor='#10b981', labelcolor='#d1d5db')

# --- Panel 3: Berry-Esseen ---
ax3 = axes[1, 0]
n_values = np.arange(5, 501, 5)
sup_diffs = []
for n in n_values:
    means = np.array([np.sqrt(n) * (np.mean(np.random.exponential(1, n)) - 1)
                       for _ in range(3000)])
    means_sorted = np.sort(means)
    ecdf = np.arange(1, len(means_sorted) + 1) / len(means_sorted)
    ncdf = stats.norm.cdf(means_sorted)
    sup_diffs.append(np.max(np.abs(ecdf - ncdf)))
rho = 2.0  # third absolute moment of Exp(1)
be_bound = 0.4748 * rho / np.sqrt(n_values)
ax3.plot(n_values, sup_diffs, color='#10b981', linewidth=2, label='Empirical sup|Fn - Phi|')
ax3.plot(n_values, be_bound, color='#f87171', linewidth=2, linestyle='--', label='Berry-Esseen bound')
ax3.set_title('Berry-Esseen Convergence Rate', color='#10b981', fontsize=13, fontweight='bold')
ax3.set_xlabel('n')
ax3.set_ylabel('sup |Fn(x) - Phi(x)|')
ax3.legend(fontsize=9, facecolor='#1e293b', edgecolor='#10b981', labelcolor='#d1d5db')

# --- Panel 4: Hoeffding bound ---
ax4 = axes[1, 1]
n = 100
ts = np.linspace(0.01, 0.5, 50)
hoeffding = 2 * np.exp(-2 * n * ts**2)
empirical_probs = []
mu = 0.5
for t in ts:
    count = 0
    for _ in range(5000):
        x_bar = np.mean(np.random.uniform(0, 1, n))
        if abs(x_bar - mu) >= t:
            count += 1
    empirical_probs.append(count / 5000)
ax4.semilogy(ts, empirical_probs, color='#10b981', linewidth=2, label='Empirical P(|Xbar-mu|>=t)')
ax4.semilogy(ts, hoeffding, color='#f87171', linewidth=2, linestyle='--', label='Hoeffding: 2exp(-2nt^2)')
chebyshev_bound = (1/12) / (n * ts**2)
ax4.semilogy(ts, chebyshev_bound, color='#fbbf24', linewidth=2, linestyle=':', label='Chebyshev bound')
ax4.set_title("Hoeffding vs Chebyshev (n=100, U[0,1])", color='#10b981', fontsize=13, fontweight='bold')
ax4.set_xlabel('t')
ax4.set_ylabel('Probability (log scale)')
ax4.legend(fontsize=8, facecolor='#1e293b', edgecolor='#10b981', labelcolor='#d1d5db')
ax4.set_ylim(1e-6, 10)

plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0f172a')
print("Limit theorems laboratory complete.")
print(f"LLN: final running mean = {running_mean[-1]:.4f} (true = 2.0)")
print(f"Berry-Esseen: sup diff at n=500 = {sup_diffs[-1]:.4f}, bound = {be_bound[-1]:.4f}")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

←Expectation & Variance Bayesian Inference→

Share:X Reddit LinkedIn