Part II: Supervised Learning
Supervised learning is the foundation of applied machine learning: given labelled examples, find a function that generalises to new inputs. This part derives three landmark algorithms from first principles β linear regression, logistic regression, and support vector machines β establishing the statistical and geometric intuition that underpins every modern model.
Chapter 4: Linear Regression
OLS derivation from scratch, the geometric interpretation as projection, Ridge and Lasso regularisation, and the full bias-variance decomposition.
Chapter 5: Logistic Regression
Sigmoid from log-odds, cross-entropy loss from Bernoulli MLE, gradient derivation, Newtonβs method / IRLS, and multi-class softmax.
Chapter 6: Support Vector Machines
Maximum margin derivation, Lagrangian dual formulation, KKT conditions, soft margin with slack variables, and the kernel trick.