Machine Learning

Part VII: Advanced Topics

The frontier of modern machine learning: agents that learn by interacting with environments, neural networks that reason over structured relational data, and generative models that learn to sculpt structure out of pure noise. Each chapter delivers full mathematical derivations, SVG diagrams, and runnable Python simulations.

19

Chapter 19: Reinforcement Learning

MDPs, Bellman equations, Q-learning, policy gradients (REINFORCE), Actor-Critic, and PPO β€” full derivations from the ground up.

MDP formulation: (S, A, P, R, Ξ³)Value function V^Ο€(s) and Bellman equation derivationAction-value Q^Ο€(s,a) and optimal Bellman operatorQ-learning: off-policy TD update ruleREINFORCE: log-derivative trick β†’ policy gradientActor-Critic and PPO clipped surrogate objectivePython: Q-learning gridworld + value heatmap + policy arrows
20

Chapter 20: Graph Neural Networks

Message passing, spectral graph theory, GCN derivation from the graph Laplacian, graph attention networks, and molecular property prediction.

Graph representation: A, D, X matricesMessage passing: AGGREGATE + UPDATE frameworkGraph Laplacian L = D βˆ’ A, eigendecompositionGCN derivation: Chebyshev β†’ DΜƒ^{-1/2} Γƒ DΜƒ^{-1/2} H WGAT: attention coefficients via softmaxGraph-level readout and poolingPython: GCN on Karate Club graph, node embedding viz
21

Chapter 21: Diffusion Models

Forward noising process, closed-form q(x_t|x_0) derivation, variational training objective simplification to L_simple, score matching, DDPM/DDIM sampling.

Forward process q(x_t | x_{t-1}) = N(√(1βˆ’Ξ²_t) x_{t-1}, Ξ²_t I)Closed-form q(x_t | x_0) via αΎ±_t reparameterisationReverse process p_ΞΈ(x_{t-1} | x_t)ELBO β†’ L_simple = E[||Ξ΅ βˆ’ Ξ΅_ΞΈ(x_t, t)||Β²]Score matching: s(x) = βˆ‡_x log p(x)DDPM / DDIM sampling, classifier-free guidancePython: 1D diffusion β€” forward noising + denoising recovery

What you will learn

βœ“Derive the Bellman equation from the definition of V^Ο€(s)
βœ“Prove convergence of Q-learning via contraction mapping
βœ“Derive the REINFORCE policy gradient using the log-derivative trick
βœ“Explain the clipped surrogate objective in PPO
βœ“Derive the GCN layer from spectral convolution on graphs
βœ“Implement message passing and attention in graph networks
βœ“Derive q(x_t | x_0) in closed form using reparameterisation
βœ“Simplify the diffusion ELBO to the noise-prediction L_simple loss

Prerequisites

Parts I–VI: mathematical foundations, supervised learning, neural networks, unsupervised learning, probabilistic ML, and sequence models. You should be comfortable with probability theory, gradient descent, and variational inference before beginning this part.