Machine Learning

Part VII: Advanced Topics

The frontier of modern machine learning: agents that learn by interacting with environments, neural networks that reason over structured relational data, and generative models that learn to sculpt structure out of pure noise. Each chapter delivers full mathematical derivations, SVG diagrams, and runnable Python simulations.

Chapter 19: Reinforcement Learning

MDPs, Bellman equations, Q-learning, policy gradients (REINFORCE), Actor-Critic, and PPO — full derivations from the ground up.

MDP formulation: (S, A, P, R, γ)Value function V^π(s) and Bellman equation derivationAction-value Q^π(s,a) and optimal Bellman operatorQ-learning: off-policy TD update ruleREINFORCE: log-derivative trick → policy gradientActor-Critic and PPO clipped surrogate objectivePython: Q-learning gridworld + value heatmap + policy arrows

Chapter 20: Graph Neural Networks

Message passing, spectral graph theory, GCN derivation from the graph Laplacian, graph attention networks, and molecular property prediction.

Graph representation: A, D, X matricesMessage passing: AGGREGATE + UPDATE frameworkGraph Laplacian L = D − A, eigendecompositionGCN derivation: Chebyshev → D̃^{-1/2} Ã D̃^{-1/2} H WGAT: attention coefficients via softmaxGraph-level readout and poolingPython: GCN on Karate Club graph, node embedding viz

Chapter 21: Diffusion Models

Forward noising process, closed-form q(x_t|x_0) derivation, variational training objective simplification to L_simple, score matching, DDPM/DDIM sampling.

Forward process q(x_t | x_{t-1}) = N(√(1−β_t) x_{t-1}, β_t I)Closed-form q(x_t | x_0) via ᾱ_t reparameterisationReverse process p_θ(x_{t-1} | x_t)ELBO → L_simple = E[||ε − ε_θ(x_t, t)||²]Score matching: s(x) = ∇_x log p(x)DDPM / DDIM sampling, classifier-free guidancePython: 1D diffusion — forward noising + denoising recovery

What you will learn

✓Derive the Bellman equation from the definition of V^π(s)

✓Prove convergence of Q-learning via contraction mapping

✓Derive the REINFORCE policy gradient using the log-derivative trick

✓Explain the clipped surrogate objective in PPO

✓Derive the GCN layer from spectral convolution on graphs

✓Implement message passing and attention in graph networks

✓Derive q(x_t | x_0) in closed form using reparameterisation

✓Simplify the diffusion ELBO to the noise-prediction L_simple loss

Prerequisites

Parts I–VI: mathematical foundations, supervised learning, neural networks, unsupervised learning, probabilistic ML, and sequence models. You should be comfortable with probability theory, gradient descent, and variational inference before beginning this part.

Share:X Reddit LinkedIn

Part VI: Sequence Models Chapter 19: Reinforcement Learning