Part I: Probability | Chapter 4

Common Distributions

Bernoulli, binomial, Poisson, uniform, normal, exponential, gamma, and beta distributions

Historical Context

The common probability distributions emerged from specific practical problems. Jacob Bernoulli introduced the Bernoulli trial in his Ars Conjectandi (1713). Abraham de Moivre discovered the normal distribution around 1733 as an approximation to the binomial, though it is often attributed to Gauss (who applied it to astronomical errors in 1809). Simeon Denis Poisson derived his eponymous distribution in 1837 as a limit of the binomial for rare events. The exponential and gamma distributions arose from the study of waiting times and Bayesian inference, while the beta distribution was introduced by Euler in his study of the beta function and later found essential in Bayesian statistics as the conjugate prior for binomial data.

4.1 Bernoulli and Binomial Distributions

Bernoulli Distribution

$X \sim \text{Bernoulli}(p)$: a single trial with probability $p$ of success.

$$P(X = k) = p^k (1-p)^{1-k}, \quad k \in \{0, 1\}$$

$E[X] = p$, $\text{Var}(X) = p(1-p)$, $M_X(t) = 1 - p + pe^t$.

Binomial Distribution

$X \sim \text{Binomial}(n, p)$: the number of successes in $n$independent Bernoulli trials.

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n$$

Derivation 1: Binomial PMF from First Principles

Each specific sequence of $k$ successes and $n-k$ failures has probability$p^k(1-p)^{n-k}$ by independence. The number of such sequences is$\binom{n}{k} = \frac{n!}{k!(n-k)!}$, hence the PMF.

Verification: $\sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} = (p + (1-p))^n = 1$by the binomial theorem.

Mean: $E[X] = E\!\left[\sum_{i=1}^n X_i\right] = \sum_{i=1}^n E[X_i] = np$by linearity, where $X_i \sim \text{Bernoulli}(p)$.

Variance: $\text{Var}(X) = \sum_{i=1}^n \text{Var}(X_i) = np(1-p)$by independence.

MGF: $M_X(t) = \prod_{i=1}^n M_{X_i}(t) = (1 - p + pe^t)^n$.

4.2 Poisson Distribution

Poisson Distribution

$X \sim \text{Poisson}(\lambda)$: the number of events in a fixed interval when events occur independently at a constant average rate $\lambda$.

$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots$$

$E[X] = \lambda$, $\text{Var}(X) = \lambda$, $M_X(t) = e^{\lambda(e^t - 1)}$.

Derivation 2: Poisson as Limit of Binomial

Let $X_n \sim \text{Binomial}(n, \lambda/n)$ where $\lambda$ is fixed and$n \to \infty$. We show $P(X_n = k) \to e^{-\lambda}\lambda^k/k!$:

$$P(X_n = k) = \binom{n}{k} \left(\frac{\lambda}{n}\right)^k \left(1 - \frac{\lambda}{n}\right)^{n-k}$$

Factor by factor as $n \to \infty$:

$$\binom{n}{k} \frac{1}{n^k} = \frac{n(n-1)\cdots(n-k+1)}{n^k} \to 1$$

$$\left(1 - \frac{\lambda}{n}\right)^n \to e^{-\lambda}$$

$$\left(1 - \frac{\lambda}{n}\right)^{-k} \to 1$$

Combining: $P(X_n = k) \to \frac{\lambda^k}{k!} \cdot e^{-\lambda} = \frac{\lambda^k e^{-\lambda}}{k!}$. This shows the Poisson arises naturally when there are many independent rare events.

Poisson Process Connection

The Poisson distribution is intimately connected to the Poisson process. If events arrive according to a Poisson process with rate $\lambda$, then the number of events in any interval of length $t$ follows $\text{Poisson}(\lambda t)$. The inter-arrival times are independent $\text{Exponential}(\lambda)$ random variables. This connection links discrete counting distributions to continuous waiting-time distributions.

4.3 Uniform and Normal Distributions

Continuous Uniform Distribution

$X \sim \text{Uniform}(a, b)$:

$$f_X(x) = \frac{1}{b - a}, \quad a \leq x \leq b$$

$E[X] = (a+b)/2$, $\text{Var}(X) = (b-a)^2/12$,$M_X(t) = \frac{e^{tb} - e^{ta}}{t(b-a)}$.

Normal (Gaussian) Distribution

$X \sim \mathcal{N}(\mu, \sigma^2)$:

$$f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}$$

$E[X] = \mu$, $\text{Var}(X) = \sigma^2$,$M_X(t) = \exp(\mu t + \sigma^2 t^2/2)$.

Derivation 3: Normal PDF Integrates to 1

We prove the Gaussian integral $I = \int_{-\infty}^{\infty} e^{-x^2/2} dx = \sqrt{2\pi}$using the famous trick of squaring:

$$I^2 = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-(x^2 + y^2)/2} \, dx \, dy$$

Convert to polar coordinates: $x = r\cos\theta$, $y = r\sin\theta$,$dx\,dy = r\,dr\,d\theta$:

$$I^2 = \int_0^{2\pi} \int_0^{\infty} e^{-r^2/2} r \, dr \, d\theta = 2\pi \int_0^{\infty} r e^{-r^2/2} \, dr$$

Substituting $u = r^2/2$, $du = r\,dr$:

$$I^2 = 2\pi \int_0^{\infty} e^{-u} \, du = 2\pi \cdot 1 = 2\pi$$

Therefore $I = \sqrt{2\pi}$, confirming that$\frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ integrates to 1.

The 68-95-99.7 Rule

For $X \sim \mathcal{N}(\mu, \sigma^2)$: about 68.27% of values fall within $\mu \pm \sigma$, 95.45% within $\mu \pm 2\sigma$, and 99.73% within $\mu \pm 3\sigma$. This rule provides quick mental estimates for normal probabilities.

4.4 Exponential Distribution

Exponential Distribution

$X \sim \text{Exponential}(\lambda)$: the waiting time for the first event in a Poisson process with rate $\lambda$.

$$f_X(x) = \lambda e^{-\lambda x}, \quad x \geq 0$$

$E[X] = 1/\lambda$, $\text{Var}(X) = 1/\lambda^2$,$M_X(t) = \frac{\lambda}{\lambda - t}$ for $t < \lambda$.

Derivation 4: The Memoryless Property

The exponential distribution is the only continuous distribution that is memoryless:

$$P(X > s + t \mid X > s) = P(X > t) \quad \text{for all } s, t \geq 0$$

Proof: Using the survival function $\bar{F}(x) = e^{-\lambda x}$:

$$P(X > s + t \mid X > s) = \frac{P(X > s + t)}{P(X > s)} = \frac{e^{-\lambda(s+t)}}{e^{-\lambda s}} = e^{-\lambda t} = P(X > t)$$

Uniqueness: Suppose $\bar{F}(s+t) = \bar{F}(s)\bar{F}(t)$ for all$s, t \geq 0$ with $\bar{F}$ right-continuous and $\bar{F}(0) = 1$. Taking logarithms, $g(s+t) = g(s) + g(t)$ where $g = \ln \bar{F}$. By Cauchy's functional equation (with the monotonicity constraint from$\bar{F}$ being decreasing), $g(t) = -\lambda t$ for some$\lambda > 0$. Hence $\bar{F}(t) = e^{-\lambda t}$.

4.5 Gamma and Beta Distributions

Gamma Distribution

$X \sim \text{Gamma}(\alpha, \beta)$:

$$f_X(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0$$

where $\Gamma(\alpha) = \int_0^\infty t^{\alpha-1} e^{-t} \, dt$.$E[X] = \alpha/\beta$, $\text{Var}(X) = \alpha/\beta^2$. Special cases: $\text{Gamma}(1, \lambda) = \text{Exp}(\lambda)$,$\text{Gamma}(n/2, 1/2) = \chi^2(n)$.

Beta Distribution

$X \sim \text{Beta}(\alpha, \beta)$:

$$f_X(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 < x < 1$$

where $B(\alpha, \beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)$.$E[X] = \alpha/(\alpha+\beta)$,$\text{Var}(X) = \alpha\beta / [(\alpha+\beta)^2(\alpha+\beta+1)]$. The Beta distribution is the conjugate prior for the Bernoulli and binomial likelihoods.

Derivation 5: Sum of Exponentials is Gamma

If $X_1, \ldots, X_n$ are i.i.d. $\text{Exp}(\lambda)$, then$S_n = X_1 + \cdots + X_n \sim \text{Gamma}(n, \lambda)$.

Proof via MGF: The MGF of $\text{Exp}(\lambda)$ is$M(t) = \lambda/(\lambda - t)$. For the sum of independent variables:

$$M_{S_n}(t) = \prod_{i=1}^n M_{X_i}(t) = \left(\frac{\lambda}{\lambda - t}\right)^n$$

This is the MGF of $\text{Gamma}(n, \lambda)$. Since the MGF uniquely determines the distribution, $S_n \sim \text{Gamma}(n, \lambda)$. Physically, this is the waiting time for $n$ events in a Poisson process with rate $\lambda$.

4.6 Applications

Application 1: Queuing Theory

In an M/M/1 queue (Poisson arrivals, exponential service times, single server), arrivals follow$\text{Poisson}(\lambda)$ and service times are $\text{Exp}(\mu)$. The steady-state probability of $n$ customers in the system is$\pi_n = (1 - \rho)\rho^n$ where $\rho = \lambda/\mu < 1$ is the traffic intensity. The expected number in the system is $E[N] = \rho/(1-\rho)$.

Application 2: Radioactive Decay

The number of decay events per unit time follows a Poisson distribution, and the time between decays follows an exponential distribution. The half-life $t_{1/2}$ relates to the rate parameter via $t_{1/2} = \ln(2)/\lambda$. Carbon-14 dating relies on this model with $t_{1/2} \approx 5730$ years.

Application 3: Bayesian Inference with Beta Prior

The Beta-Binomial model is the simplest Bayesian conjugate model. If$p \sim \text{Beta}(\alpha, \beta)$ (prior) and$X \mid p \sim \text{Binomial}(n, p)$ (likelihood), then the posterior is:

$$p \mid X = k \sim \text{Beta}(\alpha + k, \beta + n - k)$$

The posterior mean is $(\alpha + k)/(\alpha + \beta + n)$, a weighted average of the prior mean $\alpha/(\alpha + \beta)$ and the sample proportion $k/n$.

Application 4: Insurance Claims

In actuarial science, the number of claims per period often follows a Poisson distribution, while claim amounts follow a Gamma or log-normal distribution. The total claims$S = \sum_{i=1}^N X_i$ (where $N \sim \text{Poisson}$ and$X_i \sim \text{Gamma}$) is a compound Poisson random variable, fundamental to ruin theory and premium calculation.

4.7 Python Simulation

This simulation demonstrates the common distributions, the Poisson limit theorem, and the Beta-Binomial conjugate model.

Common Distributions: Properties and Relationships

Python

script.py102 lines

import random
import math

random.seed(42)

def box_muller():
    u1 = random.random()
    u2 = random.random()
    return math.sqrt(-2 * math.log(u1)) * math.cos(2 * math.pi * u2)

# --- Simulation 1: Poisson Limit Theorem ---
print("=== Poisson as Limit of Binomial ===")
lam = 3.0
print("  lambda = %.1f" % lam)
print("  n     Binomial P(X=k)   Poisson P(X=k)")

for n in [10, 50, 200, 1000]:
    p = lam / n
    # Simulate Binomial(n, p)
    n_trials = 100000
    counts = [0] * 10
    for _ in range(n_trials):
        x = sum(1 for _ in range(n) if random.random() < p)
        if x < 10:
            counts[x] += 1
    print("  n=%4d:" % n)
    for k in range(6):
        binom_prob = counts[k] / n_trials
        poisson_prob = math.exp(-lam) * lam**k / math.factorial(k)
        print("    k=%d:  %.4f          %.4f" % (k, binom_prob, poisson_prob))

# --- Simulation 2: Memoryless Property of Exponential ---
print("")
print("=== Memoryless Property of Exponential ===")
rate = 2.0
s_val = 1.0
n_samples = 200000

# Generate Exp(rate) samples
exp_samples = [-math.log(1 - random.random()) / rate for _ in range(n_samples)]

# P(X > s+t | X > s) should equal P(X > t)
conditional_samples = [x - s_val for x in exp_samples if x > s_val]
print("  rate=%.1f, s=%.1f" % (rate, s_val))
print("  t    P(X>s+t|X>s)  P(X>t)   theoretical")
for t in [0.5, 1.0, 1.5, 2.0]:
    cond_prob = sum(1 for x in conditional_samples if x > t) / len(conditional_samples)
    uncond_prob = sum(1 for x in exp_samples if x > t) / n_samples
    theo = math.exp(-rate * t)
    print("  %.1f    %.4f        %.4f    %.4f" % (t, cond_prob, uncond_prob, theo))

# --- Simulation 3: Beta-Binomial Conjugate ---
print("")
print("=== Beta-Binomial Conjugate Model ===")
# Prior: Beta(2, 5), true p = 0.6
alpha_prior = 2.0
beta_prior = 5.0
true_p = 0.6
print("Prior: Beta(%.0f, %.0f), mean = %.4f" % (alpha_prior, beta_prior, alpha_prior/(alpha_prior+beta_prior)))

random.seed(99)
n_obs = 20
data = sum(1 for _ in range(n_obs) if random.random() < true_p)
print("Observed: %d successes in %d trials" % (data, n_obs))

alpha_post = alpha_prior + data
beta_post = beta_prior + n_obs - data
post_mean = alpha_post / (alpha_post + beta_post)
post_var = alpha_post * beta_post / ((alpha_post+beta_post)**2 * (alpha_post+beta_post+1))
print("Posterior: Beta(%.0f, %.0f)" % (alpha_post, beta_post))
print("Posterior mean: %.4f" % post_mean)
print("Posterior std:  %.4f" % math.sqrt(post_var))
print("MLE:            %.4f" % (data / n_obs))
print("True p:         %.4f" % true_p)

# --- Simulation 4: Gamma as sum of exponentials ---
print("")
print("=== Gamma = Sum of Exponentials ===")
rate_g = 1.5
for n_exp in [1, 3, 5, 10]:
    samples = []
    for _ in range(80000):
        total = sum(-math.log(1 - random.random()) / rate_g for _ in range(n_exp))
        samples.append(total)
    mean_s = sum(samples) / len(samples)
    var_s = sum((x - mean_s)**2 for x in samples) / len(samples)
    print("  Gamma(%d, %.1f): E=%.4f (%.4f), Var=%.4f (%.4f)" % (
        n_exp, rate_g, mean_s, n_exp/rate_g, var_s, n_exp/rate_g**2))

# --- Simulation 5: Normal distribution - 68-95-99.7 rule ---
print("")
print("=== Normal Distribution: 68-95-99.7 Rule ===")
mu_n = 5.0
sigma_n = 2.0
normal_samples = [mu_n + sigma_n * box_muller() for _ in range(200000)]

for k in [1, 2, 3]:
    count = sum(1 for x in normal_samples if abs(x - mu_n) < k * sigma_n)
    pct = count / len(normal_samples) * 100
    print("  Within %d sigma: %.2f%%" % (k, pct))
print("  Theoretical: 68.27%, 95.45%, 99.73%")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

4.8 Summary and Key Takeaways

Discrete Distributions

Bernoulli (single trial), Binomial (sum of trials), and Poisson (rare events limit) form a natural hierarchy connected by limiting arguments.

Continuous Distributions

The normal distribution (Gaussian integral), exponential (memoryless waiting times), gamma (sum of exponentials), and beta (conjugate prior on [0,1]) are workhorses of statistics.

Relationships

Binomial approaches Poisson (many rare events) and Normal (CLT). Exponential is a special Gamma. Beta is the conjugate prior for Binomial. These connections form a rich web.

Share:X Reddit LinkedIn