Common Distributions
Bernoulli, binomial, Poisson, uniform, normal, exponential, gamma, and beta distributions
Historical Context
The common probability distributions emerged from specific practical problems. Jacob Bernoulli introduced the Bernoulli trial in his Ars Conjectandi (1713). Abraham de Moivre discovered the normal distribution around 1733 as an approximation to the binomial, though it is often attributed to Gauss (who applied it to astronomical errors in 1809). Simeon Denis Poisson derived his eponymous distribution in 1837 as a limit of the binomial for rare events. The exponential and gamma distributions arose from the study of waiting times and Bayesian inference, while the beta distribution was introduced by Euler in his study of the beta function and later found essential in Bayesian statistics as the conjugate prior for binomial data.
4.1 Bernoulli and Binomial Distributions
Bernoulli Distribution
$X \sim \text{Bernoulli}(p)$: a single trial with probability $p$ of success.
$E[X] = p$, $\text{Var}(X) = p(1-p)$, $M_X(t) = 1 - p + pe^t$.
Binomial Distribution
$X \sim \text{Binomial}(n, p)$: the number of successes in $n$independent Bernoulli trials.
Derivation 1: Binomial PMF from First Principles
Each specific sequence of $k$ successes and $n-k$ failures has probability$p^k(1-p)^{n-k}$ by independence. The number of such sequences is$\binom{n}{k} = \frac{n!}{k!(n-k)!}$, hence the PMF.
Verification: $\sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} = (p + (1-p))^n = 1$by the binomial theorem.
Mean: $E[X] = E\!\left[\sum_{i=1}^n X_i\right] = \sum_{i=1}^n E[X_i] = np$by linearity, where $X_i \sim \text{Bernoulli}(p)$.
Variance: $\text{Var}(X) = \sum_{i=1}^n \text{Var}(X_i) = np(1-p)$by independence.
MGF: $M_X(t) = \prod_{i=1}^n M_{X_i}(t) = (1 - p + pe^t)^n$.
4.2 Poisson Distribution
Poisson Distribution
$X \sim \text{Poisson}(\lambda)$: the number of events in a fixed interval when events occur independently at a constant average rate $\lambda$.
$E[X] = \lambda$, $\text{Var}(X) = \lambda$, $M_X(t) = e^{\lambda(e^t - 1)}$.
Derivation 2: Poisson as Limit of Binomial
Let $X_n \sim \text{Binomial}(n, \lambda/n)$ where $\lambda$ is fixed and$n \to \infty$. We show $P(X_n = k) \to e^{-\lambda}\lambda^k/k!$:
Factor by factor as $n \to \infty$:
Combining: $P(X_n = k) \to \frac{\lambda^k}{k!} \cdot e^{-\lambda} = \frac{\lambda^k e^{-\lambda}}{k!}$. This shows the Poisson arises naturally when there are many independent rare events.
Poisson Process Connection
The Poisson distribution is intimately connected to the Poisson process. If events arrive according to a Poisson process with rate $\lambda$, then the number of events in any interval of length $t$ follows $\text{Poisson}(\lambda t)$. The inter-arrival times are independent $\text{Exponential}(\lambda)$ random variables. This connection links discrete counting distributions to continuous waiting-time distributions.
4.3 Uniform and Normal Distributions
Continuous Uniform Distribution
$X \sim \text{Uniform}(a, b)$:
$E[X] = (a+b)/2$, $\text{Var}(X) = (b-a)^2/12$,$M_X(t) = \frac{e^{tb} - e^{ta}}{t(b-a)}$.
Normal (Gaussian) Distribution
$X \sim \mathcal{N}(\mu, \sigma^2)$:
$E[X] = \mu$, $\text{Var}(X) = \sigma^2$,$M_X(t) = \exp(\mu t + \sigma^2 t^2/2)$.
Derivation 3: Normal PDF Integrates to 1
We prove the Gaussian integral $I = \int_{-\infty}^{\infty} e^{-x^2/2} dx = \sqrt{2\pi}$using the famous trick of squaring:
Convert to polar coordinates: $x = r\cos\theta$, $y = r\sin\theta$,$dx\,dy = r\,dr\,d\theta$:
Substituting $u = r^2/2$, $du = r\,dr$:
Therefore $I = \sqrt{2\pi}$, confirming that$\frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ integrates to 1.
The 68-95-99.7 Rule
For $X \sim \mathcal{N}(\mu, \sigma^2)$: about 68.27% of values fall within $\mu \pm \sigma$, 95.45% within $\mu \pm 2\sigma$, and 99.73% within $\mu \pm 3\sigma$. This rule provides quick mental estimates for normal probabilities.
4.4 Exponential Distribution
Exponential Distribution
$X \sim \text{Exponential}(\lambda)$: the waiting time for the first event in a Poisson process with rate $\lambda$.
$E[X] = 1/\lambda$, $\text{Var}(X) = 1/\lambda^2$,$M_X(t) = \frac{\lambda}{\lambda - t}$ for $t < \lambda$.
Derivation 4: The Memoryless Property
The exponential distribution is the only continuous distribution that is memoryless:
Proof: Using the survival function $\bar{F}(x) = e^{-\lambda x}$:
Uniqueness: Suppose $\bar{F}(s+t) = \bar{F}(s)\bar{F}(t)$ for all$s, t \geq 0$ with $\bar{F}$ right-continuous and $\bar{F}(0) = 1$. Taking logarithms, $g(s+t) = g(s) + g(t)$ where $g = \ln \bar{F}$. By Cauchy's functional equation (with the monotonicity constraint from$\bar{F}$ being decreasing), $g(t) = -\lambda t$ for some$\lambda > 0$. Hence $\bar{F}(t) = e^{-\lambda t}$.
4.5 Gamma and Beta Distributions
Gamma Distribution
$X \sim \text{Gamma}(\alpha, \beta)$:
where $\Gamma(\alpha) = \int_0^\infty t^{\alpha-1} e^{-t} \, dt$.$E[X] = \alpha/\beta$, $\text{Var}(X) = \alpha/\beta^2$. Special cases: $\text{Gamma}(1, \lambda) = \text{Exp}(\lambda)$,$\text{Gamma}(n/2, 1/2) = \chi^2(n)$.
Beta Distribution
$X \sim \text{Beta}(\alpha, \beta)$:
where $B(\alpha, \beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)$.$E[X] = \alpha/(\alpha+\beta)$,$\text{Var}(X) = \alpha\beta / [(\alpha+\beta)^2(\alpha+\beta+1)]$. The Beta distribution is the conjugate prior for the Bernoulli and binomial likelihoods.
Derivation 5: Sum of Exponentials is Gamma
If $X_1, \ldots, X_n$ are i.i.d. $\text{Exp}(\lambda)$, then$S_n = X_1 + \cdots + X_n \sim \text{Gamma}(n, \lambda)$.
Proof via MGF: The MGF of $\text{Exp}(\lambda)$ is$M(t) = \lambda/(\lambda - t)$. For the sum of independent variables:
This is the MGF of $\text{Gamma}(n, \lambda)$. Since the MGF uniquely determines the distribution, $S_n \sim \text{Gamma}(n, \lambda)$. Physically, this is the waiting time for $n$ events in a Poisson process with rate $\lambda$.
4.6 Applications
Application 1: Queuing Theory
In an M/M/1 queue (Poisson arrivals, exponential service times, single server), arrivals follow$\text{Poisson}(\lambda)$ and service times are $\text{Exp}(\mu)$. The steady-state probability of $n$ customers in the system is$\pi_n = (1 - \rho)\rho^n$ where $\rho = \lambda/\mu < 1$ is the traffic intensity. The expected number in the system is $E[N] = \rho/(1-\rho)$.
Application 2: Radioactive Decay
The number of decay events per unit time follows a Poisson distribution, and the time between decays follows an exponential distribution. The half-life $t_{1/2}$ relates to the rate parameter via $t_{1/2} = \ln(2)/\lambda$. Carbon-14 dating relies on this model with $t_{1/2} \approx 5730$ years.
Application 3: Bayesian Inference with Beta Prior
The Beta-Binomial model is the simplest Bayesian conjugate model. If$p \sim \text{Beta}(\alpha, \beta)$ (prior) and$X \mid p \sim \text{Binomial}(n, p)$ (likelihood), then the posterior is:
The posterior mean is $(\alpha + k)/(\alpha + \beta + n)$, a weighted average of the prior mean $\alpha/(\alpha + \beta)$ and the sample proportion $k/n$.
Application 4: Insurance Claims
In actuarial science, the number of claims per period often follows a Poisson distribution, while claim amounts follow a Gamma or log-normal distribution. The total claims$S = \sum_{i=1}^N X_i$ (where $N \sim \text{Poisson}$ and$X_i \sim \text{Gamma}$) is a compound Poisson random variable, fundamental to ruin theory and premium calculation.
4.7 Python Simulation
This simulation demonstrates the common distributions, the Poisson limit theorem, and the Beta-Binomial conjugate model.
Common Distributions: Properties and Relationships
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
4.8 Summary and Key Takeaways
Discrete Distributions
Bernoulli (single trial), Binomial (sum of trials), and Poisson (rare events limit) form a natural hierarchy connected by limiting arguments.
Continuous Distributions
The normal distribution (Gaussian integral), exponential (memoryless waiting times), gamma (sum of exponentials), and beta (conjugate prior on [0,1]) are workhorses of statistics.
Relationships
Binomial approaches Poisson (many rare events) and Normal (CLT). Exponential is a special Gamma. Beta is the conjugate prior for Binomial. These connections form a rich web.