Random Variables
Discrete and continuous random variables, their distributions, and transformations
Historical Context
The concept of a random variable emerged gradually in the history of probability. Early probabilists like Jacob Bernoulli and Abraham de Moivre worked directly with specific quantities like counts of successes without formalizing the underlying mapping. The term “random variable” was used informally throughout the 19th century, but the rigorous definition as a measurable function from a probability space to the real line was established by Kolmogorov in 1933. This abstraction was revolutionary: it separated the mathematical structure of a random quantity from the particular experiment generating it.
The moment generating function was developed by Laplace and later refined by the Russian school of Chebyshev, Markov, and Lyapunov, who used it to prove increasingly general forms of the central limit theorem. The characteristic function (Fourier transform of the distribution) was championed by Paul Levy and is now the standard tool for proving convergence results.
3.1 Definition and Measurability
Definition: Random Variable
A random variable is a measurable function$X: (\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$. That is, for every Borel set $B \in \mathcal{B}(\mathbb{R})$:
Equivalently, it suffices to check $\{X \leq x\} := \{\omega : X(\omega) \leq x\} \in \mathcal{F}$for all $x \in \mathbb{R}$.
The measurability condition ensures that we can compute $P(X \in B) = P(X^{-1}(B))$for any Borel set $B$. The mapping $B \mapsto P(X \in B)$ defines a probability measure on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ called the distribution or law of $X$.
Random variables are classified as discrete (taking values in a finite or countable set) or continuous (having a probability density function). A random variable can also be mixed, having both discrete and continuous components.
3.2 PMF, PDF, and CDF
Cumulative Distribution Function
The CDF of a random variable $X$ is:
Every CDF satisfies: (i) $F$ is non-decreasing, (ii)$\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$, (iii) $F$ is right-continuous: $F(x) = \lim_{y \downarrow x} F(y)$.
Probability Mass Function (Discrete)
If $X$ takes values in a countable set $\{x_1, x_2, \ldots\}$, its PMF is:
The CDF is a step function: $F_X(x) = \sum_{x_i \leq x} p_X(x_i)$.
Probability Density Function (Continuous)
A random variable $X$ is continuous if there exists a non-negative function $f_X$ (the PDF) such that:
At points of continuity of $f_X$: $f_X(x) = F_X'(x)$. Note that $f_X(x)$ is not a probability; it can exceed 1. Only $f_X(x) \, dx$ represents an infinitesimal probability.
Derivation 1: CDF Determines the Distribution
The CDF uniquely determines the distribution of $X$. To see this, we can recover probabilities of all intervals from the CDF:
Since the intervals $(a, b]$ generate $\mathcal{B}(\mathbb{R})$, the CDF determines $P(X \in B)$ for all Borel sets $B$ by the uniqueness theorem for measures (Caratheodory's extension theorem).
3.3 Expectation
Definition: Expected Value
For a discrete random variable: $E[X] = \sum_x x \, p_X(x)$
For a continuous random variable: $E[X] = \int_{-\infty}^{\infty} x \, f_X(x) \, dx$
provided the sum or integral converges absolutely.
Derivation 2: LOTUS (Law of the Unconscious Statistician)
If $g: \mathbb{R} \to \mathbb{R}$ is a measurable function and $Y = g(X)$, then we can compute $E[Y]$ without finding the distribution of $Y$:
Proof sketch (continuous case): By the change of variables theorem for Lebesgue integrals, for any measurable $g$:
The first equality is the definition of expectation on the original probability space. The second is the transfer theorem (image measure). The third uses the density.
Properties of Expectation
3.4 Variance
Definition: Variance
where $\mu = E[X]$. The standard deviation is$\sigma = \sqrt{\text{Var}(X)}$.
Derivation 3: Computational Formula for Variance
Expand the square in the definition:
Properties:
- $\text{Var}(aX + b) = a^2 \text{Var}(X)$ (constants shift but scaling squares)
- $\text{Var}(X) \geq 0$ with equality iff $X$ is constant a.s.
- If $X, Y$ are independent: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$
Higher Moments
The $k$-th moment of $X$ is $E[X^k]$ and the$k$-th central moment is $E[(X - \mu)^k]$. Important special cases:
3.5 Moment Generating Functions
Definition: MGF
The moment generating function of $X$ is:
defined for all $t$ in an open interval containing 0 where the expectation exists and is finite.
Derivation 4: MGF Generates Moments
Expand $e^{tX}$ as a Taylor series:
(interchanging sum and expectation is justified when the MGF exists in a neighborhood of 0). The $k$-th derivative at $t = 0$ gives the $k$-th moment:
Moreover, the MGF uniquely determines the distribution (when it exists in a neighborhood of zero). This is because the MGF determines all moments, and the moment problem has a unique solution for distributions with finite MGF.
Key MGF Properties
3.6 Transformations of Random Variables
Derivation 5: Change of Variables Formula
Let $X$ be a continuous random variable with PDF $f_X$, and let$Y = g(X)$ where $g$ is a monotonic differentiable function with inverse $g^{-1}$. Then the PDF of $Y$ is:
Proof: For $g$ strictly increasing:
Differentiating by the chain rule:
For $g$ strictly decreasing, the inequality reverses and we pick up a minus sign, hence the absolute value in the general formula.
Example: Log-Normal Distribution
If $X \sim \mathcal{N}(\mu, \sigma^2)$ and $Y = e^X$, then$g^{-1}(y) = \ln y$ and $dg^{-1}/dy = 1/y$. Therefore:
This is the log-normal distribution, widely used in finance (stock prices), biology (organism sizes), and environmental science (pollutant concentrations).
The CDF Method (General Approach)
For non-monotonic transformations, we use the CDF method directly: compute$F_Y(y) = P(g(X) \leq y)$ by identifying the set$\{x : g(x) \leq y\}$ and integrating $f_X$ over that set.
Example: $Y = X^2$ where $X \sim \mathcal{N}(0, 1)$
For $y > 0$:
Differentiating:
This is the $\chi^2(1)$ (chi-squared with 1 degree of freedom) distribution, equivalently $\text{Gamma}(1/2, 1/2)$.
3.7 Applications
Application 1: Quantile Functions and Simulation
The quantile function (inverse CDF) $F^{-1}(u) = \inf\{x : F(x) \geq u\}$allows us to simulate random variables: if $U \sim \text{Uniform}(0,1)$, then$X = F^{-1}(U)$ has CDF $F$. This is the probability integral transform.
Application 2: Risk Assessment
In finance, the Value-at-Risk (VaR) at level $\alpha$ is the$\alpha$-quantile of the loss distribution: $\text{VaR}_\alpha = F_L^{-1}(\alpha)$. This requires knowledge of the CDF, which depends on the choice of random variable model for losses. Common choices include the normal distribution (thin tails), $t$-distribution (heavier tails), and extreme value distributions.
Application 3: Signal Processing
In signal processing, a noisy signal is modeled as $Y = X + N$ where $X$is the signal and $N$ is noise. The distribution of $Y$ is the convolution of the distributions of $X$ and $N$, computed via MGFs:$M_Y(t) = M_X(t) M_N(t)$ (when $X$ and $N$ are independent).
Application 4: Order Statistics in Extreme Value Analysis
Given $n$ i.i.d. random variables $X_1, \ldots, X_n$ with CDF $F$, the maximum $X_{(n)}$ has CDF $F_{(n)}(x) = [F(x)]^n$. This is the starting point for extreme value theory, which models the distribution of maxima of large samples and is critical in engineering (flood levels, wind speeds) and finance (extreme losses).
3.8 Python Simulation
This simulation explores random variable transformations, the probability integral transform, and moment generating functions.
Random Variables: Transformations and MGFs
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
3.9 Summary and Key Takeaways
Random Variables
A random variable is a measurable function from $\Omega$ to $\mathbb{R}$. Its distribution is completely characterized by the CDF $F_X(x) = P(X \leq x)$.
PMF, PDF, CDF
Discrete variables have PMFs, continuous variables have PDFs. Both are related to the CDF, which exists for all random variables.
Expectation and Variance
Expectation is linear; variance satisfies $\text{Var}(X) = E[X^2] - (E[X])^2$. LOTUS lets us compute $E[g(X)]$ without finding the distribution of $g(X)$.
MGF
The moment generating function $M_X(t) = E[e^{tX}]$ uniquely determines the distribution and provides moments via differentiation at $t = 0$.
Transformations
For monotonic transformations, the change-of-variables formula gives the PDF directly. For general transformations, use the CDF method.