Probability & Statistics

A rigorous graduate-level course on probability and statistics—from measure-theoretic foundations and common distributions through Bayesian inference, maximum likelihood, MCMC methods, and causal inference.

Course Overview

Probability theory and mathematical statistics provide the language for reasoning under uncertainty. This course develops the theory from Kolmogorov's axioms through the central limit theorem, then builds modern statistical methodology including Bayesian inference, likelihood-based methods, computational techniques, and advanced topics in time series, multivariate analysis, and causal reasoning.

What You'll Learn

  • • Probability axioms and conditional probability
  • • Random variables and common distributions
  • • Expectation, variance, and limit theorems
  • • Bayesian inference and hypothesis testing
  • • Maximum likelihood estimation
  • • Regression analysis and MCMC methods
  • • Time series and multivariate analysis
  • • Nonparametric methods and causal inference

Prerequisites

  • • Multivariable calculus
  • • Linear algebra
  • • Mathematical maturity (proof-based courses)
  • • Basic programming (helpful for MCMC)

References

  • • G. Casella & R. Berger, Statistical Inference
  • • A. Gelman et al., Bayesian Data Analysis
  • • L. Wasserman, All of Statistics
  • • C. Bishop, Pattern Recognition and Machine Learning

Course Structure

Key Equations

Bayes' Theorem

$$P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)}$$

The foundation of Bayesian inference: updating beliefs with evidence

Central Limit Theorem

$$\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)$$

Standardised sample means converge in distribution to a standard normal

Maximum Likelihood Estimator

$$\hat{\theta}_{\text{MLE}} = \arg\max_{\theta} \prod_{i=1}^{n} f(x_i \mid \theta)$$

The parameter value that maximises the likelihood of the observed data

Chi-Squared Statistic

$$\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}$$

Measures the discrepancy between observed and expected frequencies

Linear Regression

$$\hat{\boldsymbol{\beta}} = (X^\top X)^{-1} X^\top \mathbf{y}$$

The ordinary least squares estimator for linear regression coefficients

Metropolis-Hastings

$$\alpha(\theta' \mid \theta) = \min\!\left(1, \; \frac{p(\theta') \, q(\theta \mid \theta')}{p(\theta) \, q(\theta' \mid \theta)}\right)$$

Acceptance probability for the Metropolis-Hastings MCMC algorithm