The Mapper Algorithm

Mapper is a topological algorithm that constructs a simplified “skeleton” of high-dimensional data by pulling back a cover of the range of a filter function. Applied to cities, it reveals the branching structure, loops, and central nodes of urban form—a topological X-ray of metropolitan organization.

1. The Mapper Construction

Given a point cloud \(X\) and a continuous function \(f: X \to \mathbb{R}\) (the filter function), Mapper constructs a simplicial complex that captures the topology of \(X\) through four steps:

Step 1: Choose a filter function. The filter maps each data point to a real value. For urban analysis, natural choices include:

  • Distance from the Central Business District (CBD).
  • Population density or land value.
  • Eccentricity: the maximum geodesic distance to any other point.
  • A principal component of socioeconomic data.

Step 2: Cover the range. Choose an open cover \(\mathcal{U} = \{U_1, U_2, \dots, U_n\}\) of \(f(X) \subseteq \mathbb{R}\). Typically, overlapping intervals of width \(w\) with overlap fraction \(p\):

$$U_i = [a_i, a_i + w], \quad a_{i+1} = a_i + w(1 - p)$$

Step 3: Pull back and cluster. For each cover element \(U_i\), consider the preimage \(f^{-1}(U_i) \cap X\)—all data points whose filter value falls in \(U_i\). Apply a clustering algorithm (e.g., single-linkage, DBSCAN) to partition each preimage into clusters:

$$f^{-1}(U_i) \cap X = C_{i,1} \sqcup C_{i,2} \sqcup \cdots \sqcup C_{i,k_i}$$

Step 4: Build the nerve. Create a vertex for each cluster \(C_{i,j}\). Connect two vertices with an edge if their clusters share data points (which can happen when cover elements overlap). More generally, form a \(k\)-simplex whenever \(k+1\) clusters have a common point.

The resulting simplicial complex is the Mapper graph—a topological skeleton that captures the essential shape of the data as seen through the filter function.

2. Mathematical Foundation: The Nerve Theorem

Mapper is grounded in the Nerve Theorem, which guarantees that the nerve of a “good” cover captures the topology of the underlying space:

$$\text{If } \mathcal{U} = \{U_i\} \text{ is a good cover of } X, \text{ then } \mathcal{N}(\mathcal{U}) \simeq X$$

where \(\mathcal{N}(\mathcal{U})\) is the nerve and \(\simeq\) denotes homotopy equivalence. A cover is “good” if all finite intersections \(U_{i_1} \cap \cdots \cap U_{i_k}\) are either empty or contractible.

In Mapper, we work with the pullback cover \(\{f^{-1}(U_i)\}\). The clustering step refines each preimage into connected components, ensuring the cover elements are as close to contractible as the data allows. The quality of the Mapper output depends on:

  • Filter function: Should capture meaningful variation in the data. Distance from CBD captures the monocentric structure.
  • Cover resolution: Number and overlap of intervals. More intervals give finer resolution but more complex output.
  • Clustering parameters: Control how aggressively points are grouped. Affects the granularity of the skeleton.

3. Urban Interpretation of Mapper Features

The Mapper graph of a city reveals its organisational skeleton:

  • Linear chains: Radial corridors extending from the centre. Each node is a distance band from the CBD, and the linear structure shows how the city extends along major axes.
  • Branch points: Where the city splits into distinct corridors or sectors. A node with degree > 2 indicates a fork in the urban structure.
  • Loops: Ring roads, orbital connections, or areas where two corridors reconverge. Loops in the Mapper graph correspond to genuine loops in the urban fabric.
  • Disconnected components: Satellite towns or isolated developments with no continuous urban connection to the main city at the chosen scale.
  • Hub nodes (large clusters): Nodes containing many data points represent dense, well-connected areas. The CBD typically appears as the largest hub.

4. Mapper with Distance-from-CBD Filter

We implement the full Mapper algorithm for a synthetic city with a central business district, radial corridors, and a ring road. The filter function is Euclidean distance from the CBD.

Mapper Algorithm with Distance-from-CBD Filter

Python
script.py260 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

5. Union-Find for Betti Numbers

Computing \(\beta_0\) (connected components) is efficiently done via the Union-Find (disjoint set) data structure, which supports near-constant-time union and find operations with path compression and union by rank.

The cycle rank formula then gives \(\beta_1\) directly:

$$\beta_1 = E - V + \beta_0$$

where \(E\) is the number of edges and \(V\) the number of vertices. This formula follows from the Euler characteristic: \(\chi = V - E = \beta_0 - \beta_1\).

Below is a Fortran-style implementation in Python that demonstrates the Union-Find algorithm applied to a Mapper graph, computing both Betti numbers efficiently.

Union-Find for Betti Numbers (Fortran-style implementation)

Python
script.py222 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

6. Mapper Parameters and Sensitivity

The Mapper output depends on three main parameters. Understanding their effect is essential for reliable urban analysis:

  • Number of intervals \(n\): Controls resolution. Too few intervals produce a coarse skeleton that misses branching structure. Too many create a complex graph dominated by noise. Typical range: 8-30 for urban datasets.
  • Overlap fraction \(p\): Controls connectivity. Higher overlap means more edges in the Mapper graph (more chances for shared points). Typical range: 20-50%.
  • Clustering threshold \(\varepsilon_c\): Controls granularity within each interval. Smaller threshold means more, smaller clusters (more nodes). Must be calibrated to the spatial scale of the data.

The statistical Mapper framework provides stability results: for sufficiently large datasets, the Mapper graph converges to the Reeb graph of the filter function, which is a topological invariant of the data:

$$\text{Reeb}_f(X) = X / \sim \quad \text{where } x \sim y \iff f(x) = f(y) \text{ and } x, y \text{ connected in } f^{-1}(f(x))$$

The Reeb graph collapses each connected component of a level set to a single point, producing the simplest possible representation of the topology through the filter lens. Mapper approximates this Reeb graph for finite data, with the approximation quality improving as the number of data points grows.

7. The Topological Toolkit for Urban Analysis

Modules 15 has assembled a complete topological toolkit for urban analysis:

  • Simplicial homology: Computes Betti numbers at a single scale. Best for comparing canonical urban forms.
  • Persistent homology: Tracks topological features across scales. Reveals multi-scale structure and separates signal from noise.
  • Mapper: Constructs a topological skeleton through a chosen filter. Reveals branching, loops, and hubs in the urban fabric.

Together, these tools provide a coordinate-free, scale-invariant, and deformation-robust language for describing urban form. Unlike metric measures (street density, block area), topological invariants capture the qualitative structure of a city: whether it has loops, branches, disconnections, or enclosed regions.

Key Takeaway

The Mapper algorithm produces a topological skeleton of the city by pulling back a cover of a filter function's range. With distance-from-CBD as the filter, it reveals radial corridors (linear chains), ring roads (loops), branching points (high-degree nodes), and the hierarchical organisation of the metropolitan area. Combined with Union-Find for efficient Betti number computation, it provides a practical and theoretically grounded approach to understanding urban form.