Machine Learning

From linear regression to diffusion models β€” complete derivations, mathematical foundations, and Python implementations of every major algorithm.

The Machine Learning Landscape

MachineLearningSupervisedLearningRegression, Classification, SVMUnsupervisedLearningClustering, PCA, AutoencodersReinforcementLearningQ-Learning, Policy GradientDeepLearningCNN, RNN, Transformer, DiffusionProbabilisticMLBayesian, GP, Variational

The Equations That Define ML

Linear Regression (OLS)

\( \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} \)

Gradient Descent

\( \boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta \nabla_{\boldsymbol{\theta}} \mathcal{L}(\boldsymbol{\theta}_t) \)

Bayes’ Theorem

\( P(\boldsymbol{\theta} \mid \mathcal{D}) = \frac{P(\mathcal{D} \mid \boldsymbol{\theta}) P(\boldsymbol{\theta})}{P(\mathcal{D})} \)

Cross-Entropy Loss

\( \mathcal{L} = -\sum_{i} y_i \log \hat{y}_i + (1-y_i)\log(1-\hat{y}_i) \)

Backpropagation

\( \frac{\partial \mathcal{L}}{\partial w_{ij}^{(l)}} = \frac{\partial \mathcal{L}}{\partial z_j^{(l)}} \cdot a_i^{(l-1)} \)

Attention (Transformer)

\( \text{Attention}(Q,K,V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V \)

ELBO (VAE)

\( \mathcal{L} = \mathbb{E}_{q}[\log p(\mathbf{x}|\mathbf{z})] - D_{\text{KL}}(q(\mathbf{z}|\mathbf{x}) \| p(\mathbf{z})) \)

Bellman Equation (RL)

\( V^*(s) = \max_a \left[ R(s,a) + \gamma \sum_{s'} P(s'|s,a) V^*(s') \right] \)

About This Course

This course teaches machine learning from the mathematics up. Every algorithm is derived from first principles: we start with the objective function, take derivatives, and arrive at the update rules. No black boxes. Every chapter includes complete MathJax derivations, SVG architecture diagrams, and Python simulations that you can run in the browser.

The course spans from classical methods (linear regression, SVMs) through the deep learning revolution (CNNs, RNNs, Transformers) to the research frontier (diffusion models, LLMs, graph neural networks). Part V on Probabilistic ML connects to the Bayesian brain framework in our Music & Mathematics course and to our Information Theory course.

Prerequisites: multivariable calculus, linear algebra, basic probability. Chapter 1–3 provide a thorough review of all necessary mathematics.

Course Structure

Part I

Mathematical Foundations

The Language of ML

Linear algebra (vectors, matrices, eigendecomposition, SVD), probability theory (Bayes’ theorem, distributions, MLE, MAP), and optimization (gradient descent, convexity, Lagrange multipliers, KKT conditions).

Part II

Supervised Learning

Learning from Labels

Linear regression (OLS derivation, regularization, bias-variance tradeoff), logistic regression (cross-entropy, softmax, Newton’s method), and SVMs (maximum margin, kernel trick, dual formulation).

Part III

Neural Networks

Deep Learning

The perceptron, backpropagation derivation (chain rule through computational graphs), deep architectures (BatchNorm, dropout, residual connections), and CNNs (convolution theorem, pooling, modern architectures).

Part IV

Unsupervised Learning

Finding Structure

K-means and Gaussian mixture models (EM algorithm derivation), PCA (eigenvalue formulation, kernel PCA, t-SNE), autoencoders and VAEs (ELBO derivation, reparameterization trick).

Part V

Probabilistic ML

Uncertainty Quantification

Bayesian inference (prior β†’ posterior, conjugacy, MCMC), Gaussian processes (kernel functions, predictive distribution), and variational inference (ELBO, mean-field, amortized inference). Cross-links to the Bayesian brain in music perception.

Part VI

Sequence Models

Language & Time

RNNs (BPTT derivation, LSTM/GRU gating), the attention mechanism (scaled dot-product, multi-head), Transformers (positional encoding, layer norm), and LLMs (GPT, BERT, scaling laws, RLHF).

Part VII

Advanced Topics

Research Frontier

Reinforcement learning (Bellman equation, policy gradient, PPO), graph neural networks (message passing, spectral convolution), and diffusion models (forward/reverse process, score matching, classifier-free guidance).

Recommended Textbooks

  • Pattern Recognition and Machine Learning β€” Christopher Bishop (2006)
  • The Elements of Statistical Learning β€” Hastie, Tibshirani & Friedman (2009)
  • Deep Learning β€” Goodfellow, Bengio & Courville (2016)
  • Mathematics for Machine Learning β€” Deisenroth, Faisal & Ong (2020)
  • Probabilistic Machine Learning β€” Kevin Murphy (2022, 2023)
  • Reinforcement Learning: An Introduction β€” Sutton & Barto (2018)