Bayesian MCMC for Urban Model Calibration
Every urban simulation carries free parameters—growth rates, carrying capacities, diffusion coefficients. Bayesian inference with Markov Chain Monte Carlo (MCMC) lets us turn observed city data into full posterior distributions over those parameters, quantifying not just best-fit values but our uncertainty about them.
1. Bayes' Theorem
We begin with the foundational identity of Bayesian statistics. Given observed data \(D\) and a parameter vector \(\boldsymbol{\theta}\), the posterior distribution combines our prior beliefs with the information in the data:
$$P(\boldsymbol{\theta} \mid D) = \frac{P(D \mid \boldsymbol{\theta})\,P(\boldsymbol{\theta})}{P(D)}$$
Since the evidence \(P(D)\) is a normalising constant that does not depend on \(\boldsymbol{\theta}\), we write the unnormalised posterior as:
$$P(\boldsymbol{\theta} \mid D) \;\propto\; P(D \mid \boldsymbol{\theta})\,P(\boldsymbol{\theta})$$
The three ingredients are:
- Prior \(P(\boldsymbol{\theta})\) — encodes domain knowledge before seeing data (e.g., growth rates are positive).
- Likelihood \(P(D \mid \boldsymbol{\theta})\) — probability of the observations given a particular parameter setting.
- Posterior \(P(\boldsymbol{\theta} \mid D)\) — the updated belief after incorporating the data.
For an urban growth model with parameters \(r\) (intrinsic growth rate) and \(K\) (carrying capacity), we want to infer \(\boldsymbol{\theta} = (r, K)\) from a time series of observed population counts \(D = \{N_0, N_1, \dots, N_T\}\).
2. Constructing the Likelihood
Consider the logistic growth model as the deterministic skeleton:
$$\frac{dN}{dt} = r\,N\!\left(1 - \frac{N}{K}\right)$$
With analytic solution:
$$N(t) = \frac{K}{1 + \left(\frac{K}{N_0} - 1\right)e^{-rt}}$$
We assume each observation is corrupted by Gaussian noise with unknown standard deviation \(\sigma\):
$$N_i^{\text{obs}} = N(t_i; r, K) + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0, \sigma^2)$$
The log-likelihood for independent observations becomes:
$$\ln P(D \mid r, K, \sigma) = -\frac{T}{2}\ln(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{T}\bigl(N_i^{\text{obs}} - N(t_i; r, K)\bigr)^2$$
Working in log-space is essential for numerical stability: likelihood values can be astronomically small, but their logarithms remain manageable floating-point numbers.
3. The Metropolis-Hastings Algorithm
MCMC generates a Markov chain whose stationary distribution is the posterior. The Metropolis-Hastings algorithm is the workhorse. At each step:
MH Algorithm Steps
- Given current state \(\boldsymbol{\theta}\), propose a candidate \(\boldsymbol{\theta}'\) from a proposal distribution \(q(\boldsymbol{\theta}' \mid \boldsymbol{\theta})\).
- Compute the acceptance ratio:
$$\alpha = \min\!\left(1,\;\frac{P(D \mid \boldsymbol{\theta}')\,P(\boldsymbol{\theta}')\,q(\boldsymbol{\theta} \mid \boldsymbol{\theta}')}{P(D \mid \boldsymbol{\theta})\,P(\boldsymbol{\theta})\,q(\boldsymbol{\theta}' \mid \boldsymbol{\theta})}\right)$$
- Draw \(u \sim \text{Uniform}(0,1)\). If \(u < \alpha\), accept: set \(\boldsymbol{\theta} \leftarrow \boldsymbol{\theta}'\). Otherwise reject: keep \(\boldsymbol{\theta}\).
For a symmetric proposal where \(q(\boldsymbol{\theta}' \mid \boldsymbol{\theta}) = q(\boldsymbol{\theta} \mid \boldsymbol{\theta}')\) (e.g., a Gaussian random walk centered on the current state), the proposal terms cancel and the acceptance ratio simplifies to the posterior ratio:
$$\alpha = \min\!\left(1,\;\frac{P(D \mid \boldsymbol{\theta}')\,P(\boldsymbol{\theta}')}{P(D \mid \boldsymbol{\theta})\,P(\boldsymbol{\theta})}\right)$$
In log-space, we compute:
$$\ln\alpha = \min\!\bigl(0,\;\bigl[\ln P(D \mid \boldsymbol{\theta}') + \ln P(\boldsymbol{\theta}')\bigr] - \bigl[\ln P(D \mid \boldsymbol{\theta}) + \ln P(\boldsymbol{\theta})\bigr]\bigr)$$
and accept if \(\ln u < \ln\alpha\). This avoids exponentiation of large negative numbers.
4. Burn-in, Thinning, and Convergence
The initial samples from the chain are influenced by the starting point and do not represent the posterior. We discard these as the burn-in period. Typical practice: discard the first 25–50% of samples.
Thinning retains every \(k\)-th sample to reduce autocorrelation. If the chain has autocorrelation length \(\tau\), we thin by keeping every \(\tau\)-th sample. The effective sample size is:
$$n_{\text{eff}} = \frac{n}{1 + 2\sum_{k=1}^{\infty}\rho(k)}$$
where \(\rho(k)\) is the lag-\(k\) autocorrelation.
Acceptance rate is a practical diagnostic. For Gaussian proposals in moderate dimensions, optimal acceptance is around 23–44%. Too high means the proposals are too timid (slow exploration); too low means they are too ambitious (many rejections).
Convergence Checklist
- Trace plots should show “hairy caterpillar” mixing (no trends or stuck periods)
- Multiple chains from different starting points should converge to the same region
- Gelman-Rubin statistic \(\hat{R} < 1.1\)
- Effective sample size \(n_{\text{eff}} > 200\) per parameter
5. Full MH Sampler for Logistic Growth
Below we implement a complete Metropolis-Hastings sampler that calibrates a logistic growth model to synthetic city population data. The code generates synthetic observations, runs the MCMC chain, and produces trace plots plus posterior histograms.
Metropolis-Hastings MCMC for Logistic Urban Growth
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
6. Urban Interpretation
The posterior distribution over \((r, K)\) carries direct planning implications:
- A wide posterior on \(K\) means the city's ultimate size is highly uncertain—infrastructure plans should account for a range of scenarios.
- The joint posterior can reveal correlations: if \(r\) and \(K\) are negatively correlated, fast-growing cities tend to saturate at lower populations (resource-limited growth).
- Bayesian credible intervals provide honest uncertainty bounds for population forecasts, unlike point estimates from least-squares fitting.
- The approach extends naturally to spatial models: calibrate diffusion coefficients, attraction kernels, or zoning impact parameters from observed land-use data.
Key Takeaway
MCMC does not just find the “best” parameters—it maps out the full landscape of plausible parameter combinations, enabling robust decision-making under uncertainty. This is essential for urban planning where stakes are high and data is noisy.