Chapter 12.3: MFG Numerics — Schemes, Policy Iteration, and Convergence

Numerical Methods for Mean Field Games

Solving the MFG system numerically requires careful discretisation that preserves the mathematical structure: the HJB discretisation must be monotone (to ensure convergence to the viscosity solution), and the FP discretisation must be positivity-preserving(densities must remain non-negative). This chapter develops the Achdou-Capuzzo Dolcetta upwind scheme for HJB, the Scharfetter-Gummel exponential fitting scheme for FP, and the superlinearly convergent policy iteration algorithm.

12.3.1 Achdou-Capuzzo Dolcetta Upwind Scheme

The HJB equation contains the nonlinear term$\frac{1}{2}|\nabla u|^2$, which must be discretised carefully to maintain the monotonicity property—the discrete operator must be non-decreasing in each argument. This ensures convergence to the unique viscosity solution.

Upwind Discretisation of the Hamiltonian

For the quadratic Hamiltonian $H(p) = \frac{1}{2}|p|^2$, the monotone discretisation uses upwind differences:

$$H_h(u_{i-1}, u_i, u_{i+1}) = \frac{1}{2}\left[\left(\frac{u_i - u_{i-1}}{h}\right)^+\right]^2 + \frac{1}{2}\left[\left(\frac{u_{i+1} - u_i}{h}\right)^-\right]^2$$

where $a^+ = \max(a, 0)$ and$a^- = \min(a, 0)$. This ensures that information propagates in the correct direction: the value at$i$ depends on upstream values.

The Full Discrete HJB

The time-stepping scheme reads:

$$\frac{u_i^n - u_i^{n+1}}{\Delta t} - \nu \frac{u_{i+1}^{n} - 2u_i^{n} + u_{i-1}^{n}}{h^2} + H_h(u_{i-1}^n, u_i^n, u_{i+1}^n) = F(x_i, \rho_i^n)$$

Note the time direction: we step backward from$n+1$ to$n$. In the general form with a transition matrix:

$$u_i^n = \min_{\alpha} \left\{\Delta t \cdot L(x_i, \alpha) + \sum_j P_{ij}(\alpha) \, u_j^{n+1}\right\}$$

where $P_{ij}(\alpha)$ is the transition probability matrix induced by control$\alpha$ and diffusion$\nu$.

12.3.2 Scharfetter-Gummel Scheme for Fokker-Planck

The Fokker-Planck equation involves a drift-diffusion flux:

$$J_{i+1/2} = -\nu \frac{\rho_{i+1} - \rho_i}{h} + v_{i+1/2} \cdot \bar{\rho}_{i+1/2}$$

where $v = -\nabla u$ is the drift velocity. A naive central difference can produce negative densities when the drift is strong (large Péclet number). The Scharfetter-Gummel (exponential fitting)scheme avoids this by using the exact solution of the local drift-diffusion problem:

$$J_{i+1/2}^{\text{SG}} = \frac{\nu}{h}\left[\rho_i B(z_{i+1/2}) - \rho_{i+1} B(-z_{i+1/2})\right]$$

where $z_{i+1/2} = v_{i+1/2} h / \nu$ is the local Péclet number and $B(z) = z / (e^z - 1)$ is the Bernoulli function:

$$B(z) = \frac{z}{e^z - 1} \quad \Rightarrow \quad B(0) = 1, \quad B(z) \to z e^{-z} \text{ as } z \to +\infty$$

The Scharfetter-Gummel scheme automatically interpolates between central differences (small $|z|$, diffusion-dominated) and upwind differences (large $|z|$, drift-dominated), guaranteeing positivity of the density at all times.

Discrete FP Update

$$\rho_i^{n+1} = \rho_i^n + \frac{\Delta t}{h}\left(J_{i-1/2}^{\text{SG}} - J_{i+1/2}^{\text{SG}}\right)$$

In matrix form: $\rho^{n+1} = P(\alpha^*)^T \rho^n$, where the transpose of the transition matrix for the HJB naturally gives the FP operator. This duality is the discrete analog of the continuous adjoint relationship between HJB and FP.

12.3.3 Policy Iteration

Policy iteration is a powerful algorithm for solving the HJB equation that converges superlinearly (Newton-like), compared to the linear convergence of value iteration. The algorithm alternates between two steps:

Policy Iteration Algorithm

Policy evaluation: Given a policy$\alpha^{(k)}$, solve the linearsystem for $u^{(k)}$:
$$-\nu \Delta u^{(k)} + \alpha^{(k)} \cdot \nabla u^{(k)} + \frac{1}{2}|\alpha^{(k)}|^2 = F(x, \rho)$$
Policy improvement: Update the policy:
$$\alpha^{(k+1)} = -\nabla u^{(k)}$$

The key advantage: Step 1 requires solving a linear elliptic PDE (since the policy is fixed), which is much cheaper than the nonlinear HJB. Step 2 is a pointwise operation. The convergence is superlinear because policy iteration is equivalent to Newton’s method applied to the HJB equation.

Convergence Rate

Policy iteration typically converges in 5–10 iterations regardless of the problem size, compared to $O(1/\epsilon)$ iterations for value iteration. The cost per iteration is dominated by the linear solve, which scales as$O(N)$ for tridiagonal systems (1D) or$O(N \log N)$ for sparse systems (2D+).

12.3.4 The CCD Perspective: Equilibrium as Fixed Point

The discrete MFG system reveals a beautiful circular structure. At equilibrium$(\rho^*, u^*)$:

HJB: The density$\rho^*$ constitutes the value function $u^*$ through the Bellman equation. Density determines costs, which determine optimal behaviour.
Optimal control: The value function$u^*$ determines the optimal policy$\alpha^* = -\nabla u^*$.
FP: The optimal actions$\alpha^*$ reconstitute the density$\rho^*$ through the Fokker-Planck equation. Behaviour determines density.

The full loop is:

$$\rho^* \xrightarrow{\text{HJB}} u^* \xrightarrow{\nabla} \alpha^* \xrightarrow{\text{FP}} \rho^*$$

This is the self-consistency condition that defines the MFG Nash equilibrium. Numerically, all three algorithms (fixed-point iteration, policy iteration, and Newton’s method) are different strategies for finding this fixed point.

12.3.5 Full Numerical MFG Solver

We implement a complete MFG solver combining policy iteration for the HJB, the Scharfetter-Gummel scheme for the FP, and an outer fixed-point loop for self-consistency.

Full MFG Solver: Policy Iteration + Scharfetter-Gummel

Python

script.py233 lines

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')

# ── Full MFG Numerical Solver ──
# Domain: [0, 2*pi] periodic
# HJB: -nu*u_xx + H(u_x) = F(x, rho) + H_bar  (stationary, ergodic)
# FP: -nu*rho_xx - (rho * u_x)_x = 0

Nx = 300
L_domain = 2 * np.pi
dx = L_domain / Nx
x = np.linspace(0, L_domain - dx, Nx)
nu = 0.1
gamma = 1.5

# External cost landscape (two congestion centers)
V_ext = -0.6 * np.cos(x) - 0.3 * np.cos(2 * x - 1.0)

def bernoulli(z):
    """Bernoulli function B(z) = z / (exp(z) - 1), numerically stable."""
    result = np.zeros_like(z)
    small = np.abs(z) < 1e-6
    result[small] = 1.0 - z[small] / 2
    result[~small] = z[~small] / (np.exp(z[~small]) - 1)
    return result

def solve_hjb_policy(rho, nu, gamma, V_ext, n_policy_iter=8):
    """Solve HJB via policy iteration."""
    u = np.zeros(Nx)
    alpha = np.zeros(Nx)  # initial policy
    residuals = []

for k in range(n_policy_iter):
        # ── Policy Evaluation: solve linear system ──
        # -nu*u_xx + alpha*u_x + 0.5*alpha^2 = rho^gamma + V_ext + H_bar
        # Discretise with upwind for advection term
        A = np.zeros((Nx, Nx))
        b = np.zeros(Nx)

for i in range(Nx):
            ip = (i + 1) % Nx
            im = (i - 1) % Nx

# Diffusion: -nu * (u_{i+1} - 2*u_i + u_{i-1}) / dx^2
            A[i, ip] += -nu / dx**2
            A[i, i] += 2 * nu / dx**2
            A[i, im] += -nu / dx**2

# Upwind advection: alpha_i * u_x
            if alpha[i] >= 0:
                A[i, i] += alpha[i] / dx
                A[i, im] -= alpha[i] / dx
            else:
                A[i, ip] += alpha[i] / dx
                A[i, i] -= alpha[i] / dx

# RHS
            b[i] = np.maximum(rho[i], 1e-10)**gamma + V_ext[i] + 0.5 * alpha[i]**2

# Fix gauge: set u[0] = 0 by adding H_bar
        # Use iterative solve (Jacobi)
        for _ in range(200):
            u_new = np.zeros(Nx)
            for i in range(Nx):
                ip = (i + 1) % Nx
                im = (i - 1) % Nx
                s = b[i]
                for j_idx in [im, ip]:
                    s -= A[i, j_idx] * u[j_idx]
                if abs(A[i, i]) > 1e-15:
                    u_new[i] = s / A[i, i]
                else:
                    u_new[i] = u[i]
            u = 0.7 * u_new + 0.3 * u
            u = u - np.mean(u)

# ── Policy Improvement ──
        u_x = np.zeros(Nx)
        for i in range(Nx):
            ip = (i + 1) % Nx
            im = (i - 1) % Nx
            u_x[i] = (u[ip] - u[im]) / (2 * dx)

alpha_new = -u_x  # optimal control

# Convergence check
        res = np.max(np.abs(alpha_new - alpha))
        residuals.append(res)
        alpha = alpha_new.copy()

if res < 1e-8:
            break

return u, alpha, residuals

def solve_fp_scharfetter_gummel(u, nu, n_iter=500):
    """Solve stationary FP using Scharfetter-Gummel scheme."""
    rho = np.ones(Nx) / L_domain

# Compute drift velocity
    v = np.zeros(Nx)
    for i in range(Nx):
        ip = (i + 1) % Nx
        im = (i - 1) % Nx
        v[i] = -(u[ip] - u[im]) / (2 * dx)

# Edge velocities
    v_edge = np.zeros(Nx)  # v at i+1/2
    for i in range(Nx):
        ip = (i + 1) % Nx
        v_edge[i] = 0.5 * (v[i] + v[ip])

dt_fp = 0.3 * dx**2 / (2 * nu + dx * np.max(np.abs(v)) + 1e-10)

for _ in range(n_iter):
        # Scharfetter-Gummel fluxes
        z = v_edge * dx / nu
        B_pos = bernoulli(z)
        B_neg = bernoulli(-z)

flux = np.zeros(Nx)  # flux at i+1/2
        for i in range(Nx):
            ip = (i + 1) % Nx
            flux[i] = (nu / dx) * (rho[i] * B_pos[i] - rho[ip] * B_neg[i])

# Update: d rho/dt = -(flux_{i+1/2} - flux_{i-1/2}) / dx
        rho_new = rho.copy()
        for i in range(Nx):
            im = (i - 1) % Nx
            rho_new[i] = rho[i] - dt_fp / dx * (flux[i] - flux[im])

rho = np.maximum(rho_new, 1e-12)
        rho = rho / (np.sum(rho) * dx)

return rho

# ── Outer fixed-point iteration ──
rho = np.ones(Nx) / L_domain
fp_convergence = []
policy_convergence_all = []

n_outer = 30
damping = 0.4

for outer in range(n_outer):
    rho_old = rho.copy()

u, alpha, pol_res = solve_hjb_policy(rho, nu, gamma, V_ext, n_policy_iter=10)
    rho_new = solve_fp_scharfetter_gummel(u, nu, n_iter=400)

rho = damping * rho_new + (1 - damping) * rho_old
    rho = np.maximum(rho, 1e-12)
    rho = rho / (np.sum(rho) * dx)

err = np.max(np.abs(rho - rho_old))
    fp_convergence.append(err)
    policy_convergence_all.append(pol_res)

# ── Plotting ──
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.patch.set_facecolor('#0a0a0a')

for ax in axes.flat:
    ax.set_facecolor('#0a0a0a')
    ax.tick_params(colors='white')
    for spine in ax.spines.values():
        spine.set_color('#334155')

# Panel 1: Equilibrium density
axes[0, 0].plot(x, rho, color='#34d399', linewidth=2.5, label='ρ*(x)')
axes[0, 0].fill_between(x, 0, rho, alpha=0.15, color='#34d399')
axes[0, 0].set_xlabel('Position x', color='white')
axes[0, 0].set_ylabel('Density ρ', color='white')
axes[0, 0].set_title('Equilibrium Density', color='#6ee7b7', fontsize=12)
axes[0, 0].legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')

# Panel 2: Value function
axes[0, 1].plot(x, u, color='#fbbf24', linewidth=2.5, label='u*(x)')
axes[0, 1].plot(x, V_ext, color='#f472b6', linewidth=1.5, linestyle=':', label='V_ext(x)')
axes[0, 1].set_xlabel('Position x', color='white')
axes[0, 1].set_ylabel('Value u', color='white')
axes[0, 1].set_title('Value Function', color='#6ee7b7', fontsize=12)
axes[0, 1].legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')

# Panel 3: Optimal control
axes[0, 2].plot(x, alpha, color='#2dd4bf', linewidth=2.5, label='α* = -∇u')
axes[0, 2].axhline(y=0, color='white', linestyle='--', alpha=0.3)
axes[0, 2].fill_between(x, 0, alpha, alpha=0.1, color='#2dd4bf')
axes[0, 2].set_xlabel('Position x', color='white')
axes[0, 2].set_ylabel('Control α*', color='white')
axes[0, 2].set_title('Optimal Control Field', color='#6ee7b7', fontsize=12)
axes[0, 2].legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')

# Panel 4: Outer FP convergence
axes[1, 0].semilogy(range(len(fp_convergence)), fp_convergence, 'o-',
                     color='#34d399', linewidth=2, markersize=4)
axes[1, 0].set_xlabel('Outer iteration', color='white')
axes[1, 0].set_ylabel('max|ρ_new - ρ_old|', color='white')
axes[1, 0].set_title('Fixed-Point Convergence', color='#6ee7b7', fontsize=12)

# Panel 5: Policy iteration convergence (last outer)
last_pol = policy_convergence_all[-1]
axes[1, 1].semilogy(range(len(last_pol)), last_pol, 's-',
                     color='#fbbf24', linewidth=2, markersize=6)
axes[1, 1].set_xlabel('Policy iteration step', color='white')
axes[1, 1].set_ylabel('max|α_new - α_old|', color='white')
axes[1, 1].set_title('Policy Iteration (superlinear)', color='#6ee7b7', fontsize=12)

# Panel 6: Scharfetter-Gummel illustration
# Show Bernoulli function
z_arr = np.linspace(-5, 5, 300)
B_arr = np.zeros_like(z_arr)
small_mask = np.abs(z_arr) < 1e-6
B_arr[small_mask] = 1.0
B_arr[~small_mask] = z_arr[~small_mask] / (np.exp(z_arr[~small_mask]) - 1)

axes[1, 2].plot(z_arr, B_arr, color='#34d399', linewidth=2.5, label='B(z) = z/(eᶻ-1)')
axes[1, 2].plot(z_arr, np.ones_like(z_arr), color='white', linestyle='--', alpha=0.3, label='Central diff (B=1)')
pos_mask = z_arr > 0.5
axes[1, 2].fill_between(z_arr[pos_mask], B_arr[pos_mask], 0, alpha=0.1, color='#f87171')
axes[1, 2].set_xlabel('Péclet number z = vh/ν', color='white')
axes[1, 2].set_ylabel('B(z)', color='white')
axes[1, 2].set_title('Bernoulli Function (SG scheme)', color='#6ee7b7', fontsize=12)
axes[1, 2].legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white', fontsize=9)
axes[1, 2].set_ylim(-0.5, 3)

plt.tight_layout(pad=2.0)
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a0a')
plt.close()
print("Full MFG numerical solver with policy iteration and SG scheme complete.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

12.3.6 Policy Iteration Convergence Analysis

We demonstrate the superlinear convergence of policy iteration compared to value iteration. The key diagnostic is the convergence rate: policy iteration achieves machine precision in 5–10 steps, while value iteration requires hundreds.

Policy Iteration vs Value Iteration: Convergence Comparison

Python

script.py142 lines

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')

# ── Compare Policy Iteration vs Value Iteration ──
Nx = 150
dx = 2 * np.pi / Nx
x = np.linspace(0, 2*np.pi - dx, Nx)
nu = 0.15
gamma = 1.0
V = -0.5 * np.cos(x)
rho_fixed = (1 + 0.3 * np.cos(x)) / (2 * np.pi)  # fixed density for comparison

def upwind_hamiltonian(u, dx):
    """Compute upwind discretisation of H = 0.5|u_x|^2."""
    H = np.zeros(Nx)
    for i in range(Nx):
        ip = (i + 1) % Nx
        im = (i - 1) % Nx
        dp = (u[ip] - u[i]) / dx  # forward difference
        dm = (u[i] - u[im]) / dx  # backward difference
        H[i] = 0.5 * max(dm, 0)**2 + 0.5 * min(dp, 0)**2
    return H

# ── Value Iteration ──
u_vi = np.zeros(Nx)
vi_residuals = []
dt_vi = 0.3 * dx**2 / nu

for k in range(500):
    u_xx = np.zeros(Nx)
    for i in range(Nx):
        ip = (i + 1) % Nx
        im = (i - 1) % Nx
        u_xx[i] = (u_vi[ip] - 2*u_vi[i] + u_vi[im]) / dx**2

H = upwind_hamiltonian(u_vi, dx)
    rhs = rho_fixed**gamma + V
    H_bar = np.mean(nu * u_xx - H + rhs)

u_new = u_vi + dt_vi * (nu * u_xx - H + rhs - H_bar)
    u_new = u_new - np.mean(u_new)

res = np.max(np.abs(u_new - u_vi))
    vi_residuals.append(res)
    u_vi = u_new

if res < 1e-12:
        break

# ── Policy Iteration ──
u_pi = np.zeros(Nx)
alpha_pi = np.zeros(Nx)
pi_residuals = []

for k in range(20):
    # Policy evaluation (Jacobi iteration on linear system)
    u_eval = u_pi.copy()
    for _ in range(300):
        u_new = np.zeros(Nx)
        for i in range(Nx):
            ip = (i + 1) % Nx
            im = (i - 1) % Nx

# -nu*u_xx + alpha*u_x + 0.5*alpha^2 = rho^gamma + V + H_bar
            diag = 2 * nu / dx**2
            if alpha_pi[i] >= 0:
                diag += alpha_pi[i] / dx
                off = nu/dx**2 * u_eval[ip] + (nu/dx**2 + alpha_pi[i]/dx) * u_eval[im]
            else:
                diag -= alpha_pi[i] / dx
                off = (nu/dx**2 - alpha_pi[i]/dx) * u_eval[ip] + nu/dx**2 * u_eval[im]

u_new[i] = (off + rho_fixed[i]**gamma + V[i] + 0.5*alpha_pi[i]**2) / diag

u_eval = 0.7 * u_new + 0.3 * u_eval
        u_eval = u_eval - np.mean(u_eval)

# Policy improvement
    alpha_new = np.zeros(Nx)
    for i in range(Nx):
        ip = (i + 1) % Nx
        im = (i - 1) % Nx
        alpha_new[i] = -(u_eval[ip] - u_eval[im]) / (2 * dx)

res = np.max(np.abs(alpha_new - alpha_pi))
    pi_residuals.append(res)

u_pi = u_eval
    alpha_pi = alpha_new

if res < 1e-12:
        break

# ── Plotting ──
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
fig.patch.set_facecolor('#0a0a0a')

for ax in axes:
    ax.set_facecolor('#0a0a0a')
    ax.tick_params(colors='white')
    for spine in ax.spines.values():
        spine.set_color('#334155')

# Panel 1: Convergence comparison
axes[0].semilogy(range(len(vi_residuals)), vi_residuals, color='#f87171', linewidth=2,
                label='Value Iteration', alpha=0.8)
axes[0].semilogy(range(len(pi_residuals)), pi_residuals, 's-', color='#34d399', linewidth=2.5,
                markersize=8, label='Policy Iteration')
axes[0].set_xlabel('Iteration', color='white')
axes[0].set_ylabel('Residual', color='white')
axes[0].set_title('Convergence: Policy vs Value Iteration', color='#6ee7b7', fontsize=12)
axes[0].legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')

# Panel 2: Convergence rate (ratio of successive residuals)
if len(pi_residuals) > 2:
    pi_rates = [pi_residuals[i+1] / max(pi_residuals[i], 1e-15) for i in range(len(pi_residuals)-1)]
    axes[1].plot(range(len(pi_rates)), pi_rates, 's-', color='#34d399', linewidth=2, markersize=8,
                label='Policy iter ratio')
    axes[1].axhline(y=0, color='white', linestyle='--', alpha=0.3)
    axes[1].set_xlabel('Iteration', color='white')
    axes[1].set_ylabel('Ratio r_{k+1}/r_k', color='white')
    axes[1].set_title('Convergence Rate (→ 0 = superlinear)', color='#6ee7b7', fontsize=12)
    axes[1].legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')

# Panel 3: Solution comparison
axes[2].plot(x, u_vi, color='#f87171', linewidth=2, label='Value iter u(x)', alpha=0.7)
axes[2].plot(x, u_pi, color='#34d399', linewidth=2.5, linestyle='--', label='Policy iter u(x)')
axes[2].set_xlabel('Position x', color='white')
axes[2].set_ylabel('Value u(x)', color='white')
axes[2].set_title('Solution Comparison', color='#6ee7b7', fontsize=12)
axes[2].legend(facecolor='#1a1a2e', edgecolor='#334155', labelcolor='white')

plt.tight_layout(pad=2.0)
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a0a')
plt.close()

print(f"Value iteration: {len(vi_residuals)} steps to residual {vi_residuals[-1]:.2e}")
print(f"Policy iteration: {len(pi_residuals)} steps to residual {pi_residuals[-1]:.2e}")
print("Policy iteration convergence analysis complete.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Key Takeaways

The Achdou-Capuzzo Dolcetta upwind scheme ensures monotonicity of the HJB discretisation, guaranteeing convergence to the viscosity solution.
The Scharfetter-Gummel exponential fitting scheme ensures positivity of the FP density, using the Bernoulli function$B(z) = z/(e^z - 1)$.
Policy iteration converges superlinearly (Newton-like), typically in 5–10 steps, far outperforming value iteration.
The discrete MFG equilibrium has a circular structure:$\rho^* \to u^* \to \alpha^* \to \rho^*$.
The transition matrix duality $P_{ij}(\alpha)$ for HJB and $P_{ji}(\alpha^*)^T$ for FP encodes the adjoint relationship between optimality and consistency.

← MFG System Module 12 Overview →

Share:X Reddit LinkedIn