Machine Learning

Part II: Supervised Learning

Supervised learning is the foundation of applied machine learning: given labelled examples, find a function that generalises to new inputs. This part derives three landmark algorithms from first principles — linear regression, logistic regression, and support vector machines — establishing the statistical and geometric intuition that underpins every modern model.

Chapter 4: Linear Regression

OLS derivation from scratch, the geometric interpretation as projection, Ridge and Lasso regularisation, and the full bias-variance decomposition.

Normal equations: step-by-step OLS derivationRidge regression & L2 regularisationLasso and L1 sparsityBias-variance tradeoff: MSE decomposition

Chapter 5: Logistic Regression

Sigmoid from log-odds, cross-entropy loss from Bernoulli MLE, gradient derivation, Newton’s method / IRLS, and multi-class softmax.

Sigmoid from log-oddsCross-entropy loss via MLEGradient: dL/dw = Xᵀ(σ(Xw)−y)Newton's method / IRLS & softmax

Chapter 6: Support Vector Machines

Maximum margin derivation, Lagrangian dual formulation, KKT conditions, soft margin with slack variables, and the kernel trick.

Margin = 2/||w||, primal formulationFull Lagrangian dual derivationKKT complementarity conditionsKernel trick: RBF & polynomial kernels

Prerequisites from Part I

Linear Algebra

Matrix inverses and pseudo-inverses, projections onto subspaces, SVD and rank

Probability

MLE derivation, Bernoulli and Gaussian distributions, Bayes\u2019 theorem

Optimisation

Gradient and Hessian, convexity, Lagrange multipliers, KKT conditions

Share:X Reddit LinkedIn

← Ch 3: Optimization Chapter 4: Linear Regression