Chapter 12.1: The Hamilton-Jacobi-Bellman Equation

Optimal Control and the Value Function

Mean Field Games (MFGs) describe the strategic behaviour of a continuum of rational agents. Each agent solves an optimal control problem: minimise a cost functional that depends on both their own trajectory and the density of all other agents. The Hamilton-Jacobi-Bellman (HJB) equation is the fundamental PDE of optimal control—it characterises the value function, which encodes the minimum cost-to-go from any state.

In urban dynamics, the agents are commuters choosing routes, the state is their position, and the cost includes travel time (which depends on congestion, i.e., the density of other commuters). The HJB equation is one half of the MFG system; the other half is the Fokker-Planck equation for the agent density, which we develop in the next chapter.

12.1.1 The Dynamic Programming Principle

Consider an agent at position \(x\) at time \(t\), choosing a control\(\alpha(s)\) for\(s \in [t, T]\) to minimise:

$$J[x, t, \alpha] = \int_t^T L\bigl(X(s), \alpha(s), \rho(s)\bigr) \, ds + g\bigl(X(T)\bigr)$$

subject to the stochastic dynamics:

$$dX(s) = \alpha(s) \, ds + \sqrt{2\nu} \, dW(s)$$

Here \(L\) is the running cost (Lagrangian),\(g\) is the terminal cost,\(\nu > 0\) is the noise intensity (modelling random perturbations in the agent’s trajectory), and\(W(s)\) is a Wiener process.

The value function is the minimum expected cost:

$$u(x, t) = \inf_{\alpha} \, \mathbb{E}\!\left[\int_t^T L(X, \alpha, \rho) \, ds + g(X(T)) \;\Big|\; X(t) = x\right]$$

Bellman’s principle of optimality states that for any small\(dt\):

$$u(x, t) = \inf_{\alpha} \left\{ \int_t^{t+dt} L(x, \alpha) \, ds + \mathbb{E}\bigl[u(X(t+dt), t+dt)\bigr] \right\}$$

12.1.2 Deriving the HJB Equation

Expanding \(u(X(t+dt), t+dt)\) by Itô’s formula, using\(dX = \alpha \, dt + \sqrt{2\nu} \, dW\):

$$\mathbb{E}[u(X+dX, t+dt)] = u + \frac{\partial u}{\partial t} dt + \alpha \frac{\partial u}{\partial x} dt + \nu \frac{\partial^2 u}{\partial x^2} dt + O(dt^2)$$

Substituting into Bellman’s equation and taking\(dt \to 0\):

$$-\frac{\partial u}{\partial t} - \nu \Delta u + H(x, \nabla u) = F(x, \rho)$$

where the Hamiltonian \(H\) arises from the optimisation over controls:

$$H(x, p) = \sup_{\alpha} \bigl\{-\alpha \cdot p - L(x, \alpha)\bigr\}$$

For quadratic running cost \(L = \frac{1}{2}|\alpha|^2\)(kinetic energy of the agent), the Hamiltonian is:

$$H(x, p) = \frac{1}{2}|p|^2$$

and the HJB equation becomes:

$$-\frac{\partial u}{\partial t} - \nu \Delta u + \frac{1}{2}|\nabla u|^2 = F(x, \rho)$$

Optimal Control

The optimal control is obtained from the first-order condition of the supremum:

$$\alpha^*(x, t) = -\nabla u(x, t)$$

Agents move down the gradient of the value function: they flow toward regions of lower cost-to-go. This is the steepest descent principle of optimal control.

12.1.3 Cole-Hopf Transformation

The nonlinear HJB equation can be linearised by the Cole-Hopf transformation. Define:

$$u(x, t) = -2\nu \ln \psi(x, t)$$

Then \(\nabla u = -2\nu \nabla\psi / \psi\) and:

$$|\nabla u|^2 = 4\nu^2 \frac{|\nabla\psi|^2}{\psi^2}, \qquad \Delta u = -2\nu \frac{\Delta\psi}{\psi} + 2\nu \frac{|\nabla\psi|^2}{\psi^2}$$

Substituting into the HJB equation (with \(F = 0\) for simplicity):

$$\frac{\partial \psi}{\partial t} = \nu \Delta \psi$$

This is the backward heat equation—or equivalently, the imaginary-time Schrödinger equation. The nonlinear HJB for the value function is transformed into a linear diffusion equation for\(\psi\). When\(F \neq 0\), we get a Schrödinger equation with potential:

$$\frac{\partial \psi}{\partial t} = \nu \Delta \psi - \frac{F(x, \rho)}{2\nu} \psi$$

This connects optimal control to quantum mechanics: the value function plays the role of the action, and the Cole-Hopf variable\(\psi\) is the wave function. Low-cost paths correspond to high-probability quantum paths.

12.1.4 Numerical Solution of the 1D HJB

We solve the 1D HJB equation via finite differences, both directly and through the Cole-Hopf transformation. We visualise the value function and optimal trajectories.

1D HJB Equation: Value Function and Optimal Trajectories

Python
script.py173 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Key Takeaways

  • The value function \(u(x,t)\) encodes the minimum expected cost-to-go and satisfies the HJB equation.
  • The HJB equation \(-\partial_t u - \nu\Delta u + H(x,\nabla u) = F(x,\rho)\) is solved backward in time from the terminal condition.
  • The optimal control is \(\alpha^* = -\nabla u\): agents flow down the gradient of the value function.
  • The Cole-Hopf transformation \(u = -2\nu\ln\psi\) linearises the HJB into an imaginary-time Schrödinger equation.
  • The HJB is one half of the MFG system; the other half (Fokker-Planck) determines how the density \(\rho\) evolves under the optimal control.