Part I: Probability | Chapter 3

Random Variables

Discrete and continuous random variables, their distributions, and transformations

Historical Context

The concept of a random variable emerged gradually in the history of probability. Early probabilists like Jacob Bernoulli and Abraham de Moivre worked directly with specific quantities like counts of successes without formalizing the underlying mapping. The term “random variable” was used informally throughout the 19th century, but the rigorous definition as a measurable function from a probability space to the real line was established by Kolmogorov in 1933. This abstraction was revolutionary: it separated the mathematical structure of a random quantity from the particular experiment generating it.

The moment generating function was developed by Laplace and later refined by the Russian school of Chebyshev, Markov, and Lyapunov, who used it to prove increasingly general forms of the central limit theorem. The characteristic function (Fourier transform of the distribution) was championed by Paul Levy and is now the standard tool for proving convergence results.

3.1 Definition and Measurability

Definition: Random Variable

A random variable is a measurable function$X: (\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$. That is, for every Borel set $B \in \mathcal{B}(\mathbb{R})$:

$$X^{-1}(B) = \{\omega \in \Omega : X(\omega) \in B\} \in \mathcal{F}$$

Equivalently, it suffices to check $\{X \leq x\} := \{\omega : X(\omega) \leq x\} \in \mathcal{F}$for all $x \in \mathbb{R}$.

The measurability condition ensures that we can compute $P(X \in B) = P(X^{-1}(B))$for any Borel set $B$. The mapping $B \mapsto P(X \in B)$ defines a probability measure on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ called the distribution or law of $X$.

Random variables are classified as discrete (taking values in a finite or countable set) or continuous (having a probability density function). A random variable can also be mixed, having both discrete and continuous components.

3.2 PMF, PDF, and CDF

Cumulative Distribution Function

The CDF of a random variable $X$ is:

$$F_X(x) = P(X \leq x), \quad x \in \mathbb{R}$$

Every CDF satisfies: (i) $F$ is non-decreasing, (ii)$\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$, (iii) $F$ is right-continuous: $F(x) = \lim_{y \downarrow x} F(y)$.

Probability Mass Function (Discrete)

If $X$ takes values in a countable set $\{x_1, x_2, \ldots\}$, its PMF is:

$$p_X(x) = P(X = x), \quad \text{with } \sum_i p_X(x_i) = 1$$

The CDF is a step function: $F_X(x) = \sum_{x_i \leq x} p_X(x_i)$.

Probability Density Function (Continuous)

A random variable $X$ is continuous if there exists a non-negative function $f_X$ (the PDF) such that:

$$F_X(x) = \int_{-\infty}^x f_X(t) \, dt$$

At points of continuity of $f_X$: $f_X(x) = F_X'(x)$. Note that $f_X(x)$ is not a probability; it can exceed 1. Only $f_X(x) \, dx$ represents an infinitesimal probability.

Derivation 1: CDF Determines the Distribution

The CDF uniquely determines the distribution of $X$. To see this, we can recover probabilities of all intervals from the CDF:

$$P(a < X \leq b) = F_X(b) - F_X(a)$$

$$P(X > a) = 1 - F_X(a)$$

$$P(X = a) = F_X(a) - \lim_{x \uparrow a} F_X(x) = F_X(a) - F_X(a^-)$$

Since the intervals $(a, b]$ generate $\mathcal{B}(\mathbb{R})$, the CDF determines $P(X \in B)$ for all Borel sets $B$ by the uniqueness theorem for measures (Caratheodory's extension theorem).

3.3 Expectation

Definition: Expected Value

For a discrete random variable: $E[X] = \sum_x x \, p_X(x)$

For a continuous random variable: $E[X] = \int_{-\infty}^{\infty} x \, f_X(x) \, dx$

provided the sum or integral converges absolutely.

Derivation 2: LOTUS (Law of the Unconscious Statistician)

If $g: \mathbb{R} \to \mathbb{R}$ is a measurable function and $Y = g(X)$, then we can compute $E[Y]$ without finding the distribution of $Y$:

$$E[g(X)] = \begin{cases} \sum_x g(x) \, p_X(x) & \text{(discrete)} \\ \int_{-\infty}^{\infty} g(x) \, f_X(x) \, dx & \text{(continuous)} \end{cases}$$

Proof sketch (continuous case): By the change of variables theorem for Lebesgue integrals, for any measurable $g$:

$$E[g(X)] = \int_{\Omega} g(X(\omega)) \, dP(\omega) = \int_{\mathbb{R}} g(x) \, dF_X(x) = \int_{\mathbb{R}} g(x) f_X(x) \, dx$$

The first equality is the definition of expectation on the original probability space. The second is the transfer theorem (image measure). The third uses the density.

Properties of Expectation

Linearity: $E[aX + bY] = aE[X] + bE[Y]$ for any constants $a, b$and any random variables $X, Y$ (no independence required).

Monotonicity: If $X \leq Y$ almost surely, then $E[X] \leq E[Y]$.

Triangle inequality: $|E[X]| \leq E[|X|]$.

Independence: If $X$ and $Y$ are independent, then$E[XY] = E[X]E[Y]$.

3.4 Variance

Definition: Variance

$$\text{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2$$

where $\mu = E[X]$. The standard deviation is$\sigma = \sqrt{\text{Var}(X)}$.

Derivation 3: Computational Formula for Variance

Expand the square in the definition:

$$\text{Var}(X) = E[(X - \mu)^2] = E[X^2 - 2\mu X + \mu^2]$$

$$= E[X^2] - 2\mu E[X] + \mu^2 = E[X^2] - 2\mu^2 + \mu^2 = E[X^2] - \mu^2$$

Properties:

$\text{Var}(aX + b) = a^2 \text{Var}(X)$ (constants shift but scaling squares)
$\text{Var}(X) \geq 0$ with equality iff $X$ is constant a.s.
If $X, Y$ are independent: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

Higher Moments

The $k$-th moment of $X$ is $E[X^k]$ and the$k$-th central moment is $E[(X - \mu)^k]$. Important special cases:

Skewness ($k = 3$): $\gamma_1 = E[(X-\mu)^3] / \sigma^3$measures asymmetry of the distribution.

Kurtosis ($k = 4$): $\gamma_2 = E[(X-\mu)^4] / \sigma^4$measures the heaviness of tails. The normal distribution has kurtosis 3 (excess kurtosis 0).

3.5 Moment Generating Functions

Definition: MGF

The moment generating function of $X$ is:

$$M_X(t) = E[e^{tX}]$$

defined for all $t$ in an open interval containing 0 where the expectation exists and is finite.

Derivation 4: MGF Generates Moments

Expand $e^{tX}$ as a Taylor series:

$$M_X(t) = E[e^{tX}] = E\!\left[\sum_{k=0}^{\infty} \frac{(tX)^k}{k!}\right] = \sum_{k=0}^{\infty} \frac{t^k}{k!} E[X^k]$$

(interchanging sum and expectation is justified when the MGF exists in a neighborhood of 0). The $k$-th derivative at $t = 0$ gives the $k$-th moment:

$$M_X^{(k)}(0) = E[X^k]$$

Moreover, the MGF uniquely determines the distribution (when it exists in a neighborhood of zero). This is because the MGF determines all moments, and the moment problem has a unique solution for distributions with finite MGF.

Key MGF Properties

Linear transformation: If $Y = aX + b$, then$M_Y(t) = e^{bt} M_X(at)$.

Sum of independent variables: If $X$ and $Y$ are independent, then $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$.

Convergence: If $M_{X_n}(t) \to M_X(t)$ for all $t$ in a neighborhood of 0, then $X_n \xrightarrow{d} X$ (continuity theorem).

3.6 Transformations of Random Variables

Derivation 5: Change of Variables Formula

Let $X$ be a continuous random variable with PDF $f_X$, and let$Y = g(X)$ where $g$ is a monotonic differentiable function with inverse $g^{-1}$. Then the PDF of $Y$ is:

$$f_Y(y) = f_X(g^{-1}(y)) \left|\frac{d}{dy} g^{-1}(y)\right|$$

Proof: For $g$ strictly increasing:

$$F_Y(y) = P(Y \leq y) = P(g(X) \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))$$

Differentiating by the chain rule:

$$f_Y(y) = f_X(g^{-1}(y)) \cdot \frac{d}{dy} g^{-1}(y)$$

For $g$ strictly decreasing, the inequality reverses and we pick up a minus sign, hence the absolute value in the general formula.

Example: Log-Normal Distribution

If $X \sim \mathcal{N}(\mu, \sigma^2)$ and $Y = e^X$, then$g^{-1}(y) = \ln y$ and $dg^{-1}/dy = 1/y$. Therefore:

$$f_Y(y) = \frac{1}{y \sigma \sqrt{2\pi}} \exp\!\left(-\frac{(\ln y - \mu)^2}{2\sigma^2}\right), \quad y > 0$$

This is the log-normal distribution, widely used in finance (stock prices), biology (organism sizes), and environmental science (pollutant concentrations).

The CDF Method (General Approach)

For non-monotonic transformations, we use the CDF method directly: compute$F_Y(y) = P(g(X) \leq y)$ by identifying the set$\{x : g(x) \leq y\}$ and integrating $f_X$ over that set.

Example: $Y = X^2$ where $X \sim \mathcal{N}(0, 1)$

For $y > 0$:

$$F_Y(y) = P(X^2 \leq y) = P(-\sqrt{y} \leq X \leq \sqrt{y}) = \Phi(\sqrt{y}) - \Phi(-\sqrt{y}) = 2\Phi(\sqrt{y}) - 1$$

Differentiating:

$$f_Y(y) = 2\phi(\sqrt{y}) \cdot \frac{1}{2\sqrt{y}} = \frac{1}{\sqrt{2\pi y}} e^{-y/2}, \quad y > 0$$

This is the $\chi^2(1)$ (chi-squared with 1 degree of freedom) distribution, equivalently $\text{Gamma}(1/2, 1/2)$.

3.7 Applications

Application 1: Quantile Functions and Simulation

The quantile function (inverse CDF) $F^{-1}(u) = \inf\{x : F(x) \geq u\}$allows us to simulate random variables: if $U \sim \text{Uniform}(0,1)$, then$X = F^{-1}(U)$ has CDF $F$. This is the probability integral transform.

Application 2: Risk Assessment

In finance, the Value-at-Risk (VaR) at level $\alpha$ is the$\alpha$-quantile of the loss distribution: $\text{VaR}_\alpha = F_L^{-1}(\alpha)$. This requires knowledge of the CDF, which depends on the choice of random variable model for losses. Common choices include the normal distribution (thin tails), $t$-distribution (heavier tails), and extreme value distributions.

Application 3: Signal Processing

In signal processing, a noisy signal is modeled as $Y = X + N$ where $X$is the signal and $N$ is noise. The distribution of $Y$ is the convolution of the distributions of $X$ and $N$, computed via MGFs:$M_Y(t) = M_X(t) M_N(t)$ (when $X$ and $N$ are independent).

Application 4: Order Statistics in Extreme Value Analysis

Given $n$ i.i.d. random variables $X_1, \ldots, X_n$ with CDF $F$, the maximum $X_{(n)}$ has CDF $F_{(n)}(x) = [F(x)]^n$. This is the starting point for extreme value theory, which models the distribution of maxima of large samples and is critical in engineering (flood levels, wind speeds) and finance (extreme losses).

3.8 Python Simulation

This simulation explores random variable transformations, the probability integral transform, and moment generating functions.

Random Variables: Transformations and MGFs

Python

script.py100 lines

import random
import math

random.seed(42)

# --- Simulation 1: Probability Integral Transform ---
print("=== Probability Integral Transform ===")
# Generate Exponential(lambda=2) using inverse CDF
# If U ~ Uniform(0,1), then X = -ln(1-U)/lambda ~ Exp(lambda)
lam = 2.0
n_samples = 100000
exp_samples = [-math.log(1 - random.random()) / lam for _ in range(n_samples)]

# Verify: E[X] should be 1/lambda = 0.5, Var(X) = 1/lambda^2 = 0.25
mean_x = sum(exp_samples) / n_samples
var_x = sum((x - mean_x)**2 for x in exp_samples) / n_samples

print("Exponential(lambda=2) via inverse CDF:")
print("  E[X] simulated: %.4f  theoretical: %.4f" % (mean_x, 1/lam))
print("  Var(X) simulated: %.4f  theoretical: %.4f" % (var_x, 1/lam**2))

# --- Simulation 2: Transformation Y = X^2 ---
print("")
print("=== Transformation: Y = X^2 for X ~ N(0,1) ===")
# Generate standard normal using Box-Muller
def box_muller():
    u1 = random.random()
    u2 = random.random()
    z = math.sqrt(-2 * math.log(u1)) * math.cos(2 * math.pi * u2)
    return z

normal_samples = [box_muller() for _ in range(n_samples)]
chi2_samples = [x**2 for x in normal_samples]

# Y = X^2 should be chi-squared(1): E[Y]=1, Var(Y)=2
mean_y = sum(chi2_samples) / n_samples
var_y = sum((y - mean_y)**2 for y in chi2_samples) / n_samples

print("  E[Y] simulated: %.4f  theoretical: 1.0000" % mean_y)
print("  Var(Y) simulated: %.4f  theoretical: 2.0000" % var_y)

# --- Simulation 3: Log-Normal Distribution ---
print("")
print("=== Log-Normal Distribution ===")
mu_ln = 0.5
sigma_ln = 0.8
lognormal_samples = [math.exp(mu_ln + sigma_ln * box_muller()) for _ in range(n_samples)]

mean_ln = sum(lognormal_samples) / n_samples
var_ln = sum((x - mean_ln)**2 for x in lognormal_samples) / n_samples

# Theoretical: E[Y] = exp(mu + sigma^2/2), Var(Y) = (exp(sigma^2)-1)*exp(2mu+sigma^2)
theo_mean = math.exp(mu_ln + sigma_ln**2 / 2)
theo_var = (math.exp(sigma_ln**2) - 1) * math.exp(2*mu_ln + sigma_ln**2)

print("  E[Y] simulated: %.4f  theoretical: %.4f" % (mean_ln, theo_mean))
print("  Var(Y) simulated: %.4f  theoretical: %.4f" % (var_ln, theo_var))

# --- Simulation 4: MGF verification ---
print("")
print("=== MGF of Standard Normal ===")
# MGF of N(0,1) is M(t) = exp(t^2/2)
print("  t     M(t) simulated   M(t) theoretical")
for t in [0.1, 0.3, 0.5, 0.8, 1.0]:
    # E[e^(tX)] by simulation
    mgf_sim = sum(math.exp(t * x) for x in normal_samples) / n_samples
    mgf_theo = math.exp(t**2 / 2)
    print("  %.1f     %.4f           %.4f" % (t, mgf_sim, mgf_theo))

# --- Simulation 5: Moments from derivatives ---
print("")
print("=== Moments of N(0,1) from Samples ===")
for k in range(1, 7):
    moment_k = sum(x**k for x in normal_samples) / n_samples
    # Theoretical: odd moments = 0, E[X^2]=1, E[X^4]=3, E[X^6]=15
    if k % 2 == 1:
        theo = 0
    elif k == 2:
        theo = 1
    elif k == 4:
        theo = 3
    else:
        theo = 15
    print("  E[X^%d] = %8.4f  (theoretical: %d)" % (k, moment_k, theo))

# --- Simulation 6: Sum of independent variables ---
print("")
print("=== Sum of Independent Exponentials ===")
# Sum of n independent Exp(lambda) ~ Gamma(n, lambda)
for n_exp in [2, 5, 10]:
    gamma_samples = []
    for _ in range(50000):
        s = sum(-math.log(1 - random.random()) / lam for _ in range(n_exp))
        gamma_samples.append(s)
    mean_g = sum(gamma_samples) / len(gamma_samples)
    var_g = sum((x - mean_g)**2 for x in gamma_samples) / len(gamma_samples)
    # Gamma(n, lambda): E = n/lambda, Var = n/lambda^2
    print("  Gamma(%d, 2): E=%.3f (theo %.3f), Var=%.3f (theo %.3f)" % (
        n_exp, mean_g, n_exp/lam, var_g, n_exp/lam**2))

Click Run to execute the Python code

Code will be executed with Python 3 on the server

3.9 Summary and Key Takeaways

Random Variables

A random variable is a measurable function from $\Omega$ to $\mathbb{R}$. Its distribution is completely characterized by the CDF $F_X(x) = P(X \leq x)$.

PMF, PDF, CDF

Discrete variables have PMFs, continuous variables have PDFs. Both are related to the CDF, which exists for all random variables.

Expectation and Variance

Expectation is linear; variance satisfies $\text{Var}(X) = E[X^2] - (E[X])^2$. LOTUS lets us compute $E[g(X)]$ without finding the distribution of $g(X)$.

MGF

The moment generating function $M_X(t) = E[e^{tX}]$ uniquely determines the distribution and provides moments via differentiation at $t = 0$.

Transformations

For monotonic transformations, the change-of-variables formula gives the PDF directly. For general transformations, use the CDF method.

Share:X Reddit LinkedIn