Module 5 · Beyond the Single Objective

Novelty & Quality Diversity

Joel Lehman and Kenneth Stanley’s 2008 essay “Abandoning Objectives” argued that chasing the objective on hard problems is often the worst possible strategy: the objective gradient lies, deceptive local optima dominate the landscape, and the way forward runs through stepping-stones whose value is only visible in retrospect. The response — novelty search and the broader Quality-Diversity programme — has reshaped large parts of evolutionary robotics, RL exploration, and protein design.

1. Deception

A fitness landscape is deceptive when local gradient information points away from the global optimum. The canonical example is the hard maze: a robot must navigate from start to goal; fitness is Euclidean distance to the goal at episode end. Walls force a detour around the open exit, but every step of the detour increases distance to the goal. Gradient-based RL, objective-driven evolution, and even most exploration bonuses become trapped near a wall, unable to take the temporarily-worsening step that solves the problem.

Lehman & Stanley showed that for their hard-maze benchmark, novelty search — optimising only behavioural novelty, ignoring the fitness signal entirely — solves the maze far more reliably than fitness-driven search. The objective obstructs its own solution.

2. Novelty Search

Define a behaviour descriptor \(\boldsymbol{\beta}(\mathbf{x}) \in \mathcal{B}\) that captures what an individual does rather than how well. For a maze runner,\(\boldsymbol{\beta}\) might be the (x,y) coordinates of its final position. Maintain an archive \(\mathcal{A}\) of past behaviours. Define novelty

\[ \rho(\mathbf{x}) \;=\; \frac{1}{k}\sum_{i=1}^{k} \mathrm{dist}\bigl(\boldsymbol{\beta}(\mathbf{x}),\,\boldsymbol{\beta}_i^{\mathrm{NN}}\bigr) \]

the mean distance to \(\mathbf{x}\)’s \(k\)-nearest neighbours in \(\mathcal{A}\) (typically \(k=15\)). Run a standard EA but use \(\rho\) in place of fitness. Add to the archive any individual whose novelty exceeds a threshold. The result is a search that systematically fans out across the behaviour space — collecting solutions to problems the algorithm was never told about.

3. From Novelty to Quality Diversity

Pure novelty search ignores quality. The Quality-Diversity(QD) programme (Pugh, Soros, Stanley 2016) replaces the single fitness objective with the joint goal: find a diverse set of high-quality solutions, one per behaviour cell. The output is not a single best solution but an illuminated map of the behaviour space, each cell occupied by the best individual ever found that exhibited that behaviour.

QD is the natural framework for: robot gait libraries (one fast gait per body configuration); level design (one playable level per difficulty/style cell); protein libraries (one folded sequence per topology); creative-AI applications (one image or piece of music per stylistic region).

4. MAP-Elites

MAP-Elites (Mouret & Clune 2015) is the flagship QD algorithm: minimal, robust, parallel-friendly. The behaviour space\(\mathcal{B}\) is discretised into a regular grid of cells. Each cell holds at most one individual (the elite for that cell, the highest-fitness individual whose behaviour falls in that cell).

initialise an empty grid (one cell per behaviour bin)
for i in 1..N_init:
    x = sample random genome
    place x in cell beta(x) if cell empty or fitness(x) > fitness(elite)
loop:
    parent = pick random elite from the grid
    child  = mutate(parent)         # or crossover two random elites
    place child in cell beta(child) if cell empty or fitness(child) > fitness(elite)

Two metrics summarise a MAP-Elites run: coverage (fraction of non-empty cells) and QD-score (sum of fitness over all elites). Optimising fitness alone gives a single point; MAP-Elites gives an entire varietyof high-performing solutions, which is often what designers actually want.

5. CMA-ME and Beyond

The variation operator in MAP-Elites is the algorithm’s primary lever. CMA-ME (Fontaine, Lee, Soros, de Mesentier Silva, Togelius, Hoover 2020) replaces it with CMA-ES emitters, each a small CMA-ES instance pursuing improvement of an existing elite. Three emitter types — improvement, random direction, optimising — balance exploitation of known elites against exploration of new behaviour cells. CMA-ME and its successor CMA-MAE (Fontaine & Nikolaidis 2023) are the modern QD state of the art on continuous benchmarks.

Other extensions: CVT-MAP-Elites (Vassiliades et al. 2018) replaces the regular grid with a Voronoi tessellation, scaling to 100+ behaviour dimensions; Differentiable QD (Fontaine 2021) uses gradients of behaviour and fitness to accelerate exploitation; Policy Gradient Assisted MAP-Elites(Nilsson & Cully 2021) uses TD-style updates inside MAP-Elites for high-dimensional control.

6. Behaviour Descriptors: The Hard Choice

The behaviour descriptor is the core design choice in QD. A poor descriptor — one that does not align with the structure of the search problem — collapses the algorithm to either trivial novelty (every individual lands in a different cell) or trivial fitness search (every individual lands in the same cell).

Two general approaches to the descriptor problem:

Hand-engineered: use domain knowledge (final position, average velocity, foot-contact pattern). Effective when the domain is well-understood.
Learned descriptors:AURORA (Cully 2019), UR-MAP-Elites — an autoencoder is trained on observed behaviours and its latent space serves as the descriptor. The descriptor co-evolves with the search; pure unsupervised novelty.

7. The Lehman–Stanley Worldview

Beyond the algorithms, novelty/QD work changed how researchers think about optimisation in deceptive domains. Why Greatness Cannot Be Planned (Stanley & Lehman 2015) argues that progress in open-ended creative domains — science, art, technology, evolution itself — relies on stepping stoneswhose value is only legible in retrospect. Optimising for an explicit objective can actively prevent the discovery of those stepping stones. Open-ended algorithms (Module 7) push this thesis to its logical conclusion.

← Module 4 Module 6 →