Chapter 12.1: The Hamilton-Jacobi-Bellman Equation

Optimal Control and the Value Function

Mean Field Games (MFGs) describe the strategic behaviour of a continuum of rational agents. Each agent solves an optimal control problem: minimise a cost functional that depends on both their own trajectory and the density of all other agents. The Hamilton-Jacobi-Bellman (HJB) equation is the fundamental PDE of optimal control—it characterises the value function, which encodes the minimum cost-to-go from any state.

In urban dynamics, the agents are commuters choosing routes, the state is their position, and the cost includes travel time (which depends on congestion, i.e., the density of other commuters). The HJB equation is one half of the MFG system; the other half is the Fokker-Planck equation for the agent density, which we develop in the next chapter.

12.1.1 The Dynamic Programming Principle

Consider an agent at position $x$ at time $t$, choosing a control$\alpha(s)$ for$s \in [t, T]$ to minimise:

$$J[x, t, \alpha] = \int_t^T L\bigl(X(s), \alpha(s), \rho(s)\bigr) \, ds + g\bigl(X(T)\bigr)$$

subject to the stochastic dynamics:

$$dX(s) = \alpha(s) \, ds + \sqrt{2\nu} \, dW(s)$$

Here $L$ is the running cost (Lagrangian),$g$ is the terminal cost,$\nu > 0$ is the noise intensity (modelling random perturbations in the agent’s trajectory), and$W(s)$ is a Wiener process.

The value function is the minimum expected cost:

$$u(x, t) = \inf_{\alpha} \, \mathbb{E}\!\left[\int_t^T L(X, \alpha, \rho) \, ds + g(X(T)) \;\Big|\; X(t) = x\right]$$

Bellman’s principle of optimality states that for any small$dt$:

$$u(x, t) = \inf_{\alpha} \left\{ \int_t^{t+dt} L(x, \alpha) \, ds + \mathbb{E}\bigl[u(X(t+dt), t+dt)\bigr] \right\}$$

12.1.2 Deriving the HJB Equation

Expanding $u(X(t+dt), t+dt)$ by Itô’s formula, using$dX = \alpha \, dt + \sqrt{2\nu} \, dW$:

$$\mathbb{E}[u(X+dX, t+dt)] = u + \frac{\partial u}{\partial t} dt + \alpha \frac{\partial u}{\partial x} dt + \nu \frac{\partial^2 u}{\partial x^2} dt + O(dt^2)$$

Substituting into Bellman’s equation and taking$dt \to 0$:

$$-\frac{\partial u}{\partial t} - \nu \Delta u + H(x, \nabla u) = F(x, \rho)$$

where the Hamiltonian $H$ arises from the optimisation over controls:

$$H(x, p) = \sup_{\alpha} \bigl\{-\alpha \cdot p - L(x, \alpha)\bigr\}$$

For quadratic running cost $L = \frac{1}{2}|\alpha|^2$(kinetic energy of the agent), the Hamiltonian is:

$$H(x, p) = \frac{1}{2}|p|^2$$

and the HJB equation becomes:

$$-\frac{\partial u}{\partial t} - \nu \Delta u + \frac{1}{2}|\nabla u|^2 = F(x, \rho)$$

Optimal Control

The optimal control is obtained from the first-order condition of the supremum:

$$\alpha^*(x, t) = -\nabla u(x, t)$$

Agents move down the gradient of the value function: they flow toward regions of lower cost-to-go. This is the steepest descent principle of optimal control.

12.1.3 Cole-Hopf Transformation

The nonlinear HJB equation can be linearised by the Cole-Hopf transformation. Define:

$$u(x, t) = -2\nu \ln \psi(x, t)$$

Then $\nabla u = -2\nu \nabla\psi / \psi$ and:

$$|\nabla u|^2 = 4\nu^2 \frac{|\nabla\psi|^2}{\psi^2}, \qquad \Delta u = -2\nu \frac{\Delta\psi}{\psi} + 2\nu \frac{|\nabla\psi|^2}{\psi^2}$$

Substituting into the HJB equation (with $F = 0$ for simplicity):

$$\frac{\partial \psi}{\partial t} = \nu \Delta \psi$$

This is the backward heat equation—or equivalently, the imaginary-time Schrödinger equation. The nonlinear HJB for the value function is transformed into a linear diffusion equation for$\psi$. When$F \neq 0$, we get a Schrödinger equation with potential:

$$\frac{\partial \psi}{\partial t} = \nu \Delta \psi - \frac{F(x, \rho)}{2\nu} \psi$$

This connects optimal control to quantum mechanics: the value function plays the role of the action, and the Cole-Hopf variable$\psi$ is the wave function. Low-cost paths correspond to high-probability quantum paths.

12.1.4 Numerical Solution of the 1D HJB

We solve the 1D HJB equation via finite differences, both directly and through the Cole-Hopf transformation. We visualise the value function and optimal trajectories.

1D HJB Equation: Value Function and Optimal Trajectories

Python

script.py173 lines

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')

# ── 1D HJB: -du/dt - nu*d2u/dx2 + (1/2)|du/dx|^2 = F(x) ──
# Solve backward in time from terminal condition u(x, T) = g(x)

# Parameters
nu = 0.1        # viscosity / noise
Nx = 200
Nt = 500
x_min, x_max = -3, 3
T = 2.0

dx = (x_max - x_min) / Nx
dt = T / Nt
x = np.linspace(x_min, x_max, Nx + 1)

# Terminal cost: quadratic basin
g = 0.5 * x**2

# Running cost: congestion-like potential
def F_cost(x_arr):
    return 0.3 * np.exp(-x_arr**2 / 0.5)  # congestion near origin

# ── Direct HJB solve (backward in time) ──
u = g.copy()
u_history = [u.copy()]

for n in range(Nt):
    # Spatial derivatives
    du_dx = np.zeros_like(u)
    du_dx[1:-1] = (u[2:] - u[:-2]) / (2 * dx)
    du_dx[0] = (u[1] - u[0]) / dx
    du_dx[-1] = (u[-1] - u[-2]) / dx

d2u_dx2 = np.zeros_like(u)
    d2u_dx2[1:-1] = (u[2:] - 2*u[1:-1] + u[:-2]) / dx**2

# HJB update (backward Euler in time, forward in t means backward from T)
    u_new = u.copy()
    u_new[1:-1] = u[1:-1] + dt * (
        -nu * d2u_dx2[1:-1] + 0.5 * np.clip(du_dx[1:-1], -50, 50)**2 - F_cost(x[1:-1])
    )

# Boundary conditions (large cost far from origin)
    u_new[0] = u_new[1] + 0.5 * dx**2
    u_new[-1] = u_new[-2] + 0.5 * dx**2

u = np.clip(u_new, -100, 100)
    if n % (Nt // 5) == 0:
        u_history.append(u.copy())

# ── Cole-Hopf solution ──
psi_T = np.exp(-g / (2 * nu))
psi = psi_T.copy()
psi_history = [psi.copy()]

for n in range(Nt):
    d2psi = np.zeros_like(psi)
    d2psi[1:-1] = (psi[2:] - 2*psi[1:-1] + psi[:-2]) / dx**2

psi_new = psi.copy()
    psi_new[1:-1] = psi[1:-1] + dt * (
        nu * d2psi[1:-1] - F_cost(x[1:-1]) / (2 * nu) * psi[1:-1]
    )

# Boundary: psi -> 0 far from origin (high cost)
    psi_new[0] = 1e-10
    psi_new[-1] = 1e-10
    psi = np.maximum(psi_new, 1e-15)

if n % (Nt // 5) == 0:
        psi_history.append(psi.copy())

u_cole_hopf = -2 * nu * np.log(np.maximum(psi, 1e-15))

# ── Optimal control: alpha* = -du/dx ──
alpha_star = np.zeros_like(u)
alpha_star[1:-1] = -(u[2:] - u[:-2]) / (2 * dx)

# ── Optimal trajectories ──
def simulate_optimal(x0, u_field, x_grid, nu_val, T_val, n_steps=200):
    dt_traj = T_val / n_steps
    trajectory = [x0]
    for _ in range(n_steps):
        xi = trajectory[-1]
        # Interpolate gradient
        idx = np.searchsorted(x_grid, xi) - 1
        idx = max(0, min(idx, len(x_grid) - 2))
        du = (u_field[idx+1] - u_field[idx]) / (x_grid[idx+1] - x_grid[idx])
        alpha = -du
        xi_new = xi + alpha * dt_traj + np.sqrt(2 * nu_val * dt_traj) * np.random.randn()
        xi_new = np.clip(xi_new, x_min + 0.1, x_max - 0.1)
        trajectory.append(xi_new)
    return np.array(trajectory)

# ── Plotting ──
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.patch.set_facecolor('#0a0a0a')

colors_t = ['#34d399', '#2dd4bf', '#fbbf24', '#f87171', '#a78bfa', '#f472b6']

# Panel 1: Value function evolution
ax1 = axes[0, 0]
ax1.set_facecolor('#0a0a0a')
ax1.tick_params(colors='white')
for spine in ax1.spines.values():
    spine.set_color('#334155')
for i, ui in enumerate(u_history):
    t_val = i * T / len(u_history)
    ax1.plot(x, ui, color=colors_t[i % len(colors_t)], linewidth=1.5,
            label=f't = {t_val:.1f}', alpha=0.8)
ax1.set_xlabel('Position x', color='white')
ax1.set_ylabel('Value u(x,t)', color='white')
ax1.set_title('Value Function Evolution (HJB)', color='#6ee7b7', fontsize=13)
ax1.legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white', fontsize=8)
ax1.set_ylim(-0.5, 6)

# Panel 2: Cole-Hopf comparison
ax2 = axes[0, 1]
ax2.set_facecolor('#0a0a0a')
ax2.tick_params(colors='white')
for spine in ax2.spines.values():
    spine.set_color('#334155')
ax2.plot(x, u, color='#34d399', linewidth=2.5, label='Direct HJB')
ax2.plot(x, u_cole_hopf, color='#fbbf24', linewidth=2.5, linestyle='--', label='Cole-Hopf')
ax2.set_xlabel('Position x', color='white')
ax2.set_ylabel('Value u(x, 0)', color='white')
ax2.set_title('Cole-Hopf vs Direct Solution', color='#6ee7b7', fontsize=13)
ax2.legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')
ax2.set_ylim(-0.5, 6)

# Panel 3: Optimal control field
ax3 = axes[1, 0]
ax3.set_facecolor('#0a0a0a')
ax3.tick_params(colors='white')
for spine in ax3.spines.values():
    spine.set_color('#334155')
ax3.plot(x, alpha_star, color='#2dd4bf', linewidth=2.5, label='α* = -∇u')
ax3.axhline(y=0, color='white', linestyle='--', alpha=0.3)
ax3.fill_between(x, 0, alpha_star, alpha=0.15, color='#2dd4bf')
ax3.plot(x, F_cost(x) * 5, color='#f87171', linewidth=1.5, linestyle=':', label='F(x) × 5 (congestion)')
ax3.set_xlabel('Position x', color='white')
ax3.set_ylabel('Optimal velocity α*', color='white')
ax3.set_title('Optimal Control Field', color='#6ee7b7', fontsize=13)
ax3.legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')

# Panel 4: Optimal trajectories
ax4 = axes[1, 1]
ax4.set_facecolor('#0a0a0a')
ax4.tick_params(colors='white')
for spine in ax4.spines.values():
    spine.set_color('#334155')

np.random.seed(123)
x0_values = [-2.5, -1.5, -0.5, 0.5, 1.5, 2.5]
t_traj = np.linspace(0, T, 201)
for i, x0 in enumerate(x0_values):
    traj = simulate_optimal(x0, u, x, nu, T, 200)
    ax4.plot(t_traj, traj, color=colors_t[i % len(colors_t)], linewidth=1.5, alpha=0.7)

ax4.set_xlabel('Time t', color='white')
ax4.set_ylabel('Position x(t)', color='white')
ax4.set_title('Optimal Trajectories', color='#6ee7b7', fontsize=13)
ax4.axhline(y=0, color='white', linestyle='--', alpha=0.2)

plt.tight_layout(pad=2.0)
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a0a')
plt.close()
print("HJB equation solved. Value function, Cole-Hopf transform, and optimal trajectories plotted.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Key Takeaways

The value function $u(x,t)$ encodes the minimum expected cost-to-go and satisfies the HJB equation.
The HJB equation $-\partial_t u - \nu\Delta u + H(x,\nabla u) = F(x,\rho)$ is solved backward in time from the terminal condition.
The optimal control is $\alpha^* = -\nabla u$: agents flow down the gradient of the value function.
The Cole-Hopf transformation $u = -2\nu\ln\psi$ linearises the HJB into an imaginary-time Schrödinger equation.
The HJB is one half of the MFG system; the other half (Fokker-Planck) determines how the density $\rho$ evolves under the optimal control.

← Tracy-Widom & KPZ MFG System →

Share:X Reddit LinkedIn