Module 5 · Weeks 10–11 · Graduate
Information Theory & Musical Expectation
Harmonic progressions modelled as Markov chains reveal measurable information content. Shannon entropy quantifies how “surprising” each chord transition is—high entropy means high uncertainty, low entropy means the next chord is highly predictable.
Markov Harmony Engine
Shannon entropy from I:H = 2.459 bits(max 2.807)
Next-chord probability distribution from I:
Transition matrix (Bach):
| I | ii | iii | IV | V | vi | vii° | |
|---|---|---|---|---|---|---|---|
| I | 5 | 10 | 5 | 25 | 35 | 10 | 10 |
| ii | 10 | 5 | 5 | 10 | 50 | 10 | 10 |
| iii | 5 | 5 | 5 | 40 | 5 | 30 | 10 |
| IV | 15 | 10 | 5 | 5 | 45 | 10 | 10 |
| V | 50 | 5 | 5 | 10 | 5 | 20 | 5 |
| vi | 10 | 25 | 5 | 25 | 25 | 5 | 5 |
| vii° | 45 | 5 | 10 | 5 | 20 | 10 | 5 |
Mathematical Framework
A first-order Markov chain on chord states\( S = \{I, ii, iii, IV, V, vi, vii^\circ\} \) is determined by a stochastic matrix \( P \) where\( P_{ij} = \Pr(X_{t+1} = j \mid X_t = i) \):
\( P = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{17} \\ p_{21} & p_{22} & \cdots & p_{27} \\ \vdots & & \ddots & \vdots \\ p_{71} & p_{72} & \cdots & p_{77} \end{pmatrix}, \quad \sum_{j=1}^{7} p_{ij} = 1 \)
Stationary distribution: if \( P \) is irreducible and aperiodic, there exists a unique row vector \( \pi \) satisfying\( \pi P = \pi \) and \( \sum_i \pi_i = 1 \). This is the long-run frequency of each chord degree.
Shannon entropy of the transition distribution from state \( i \):
\( H_i = -\sum_{j=1}^{7} p_{ij} \log_2 p_{ij} \)
Low \( H_i \) means state \( i \) is highly predictable—the listener “knows” what comes next. High \( H_i \) signals surprise. The entropy rate of the chain is:
\( \bar{H} = \sum_{i} \pi_i H_i = -\sum_{i} \pi_i \sum_{j} p_{ij} \log_2 p_{ij} \)
This measures the average information per chord transition in the long run. Bach chorales typically have lower entropy rates than jazz standards, reflecting stronger harmonic conventions and tighter voice-leading constraints.
Python Simulation: Shannon Entropy of Harmonic Progressions
Comparing Bach, Romantic, and Jazz transition matrices through the lens of information theory: heatmaps, per-row entropy, and stationary distributions via eigenvalue decomposition.
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Advanced: Maximum Entropy Stochastic Grammars
Leistikow, R.J. (2006). Bayesian Modeling of Musical Expectations via Maximum Entropy Stochastic Grammars. Stanford University.
The Markov models above use transition matrices learned directly from data. A more principled approach, developed in Leistikow's Stanford dissertation, is to encode music-theoreticrules as parameterized linear constraints on the transition matrix, then find the distribution that maximizes the entropy rate\(H_r = -\sum_{k,l} \mu_k T_{k,l} \log_2 T_{k,l}\) subject to those constraints.
This is a convex optimization problem with guaranteed convergence. The maximum entropy rate distribution encodes everything the rules assert while assuming nothing else — the “maximally noncommittal” distribution. By the asymptotic equipartition property, this maximizes the number of typical musical sequences the model generates, making it as widely applicable as possible.
Multiple rules can be combined by simply concatenating their constraint matrices. A hidden switching state \(R_i\) selects which rule governs each note, and Bayesian inference reveals which rules are activated or violated at each point in a piece — enabling the comparison of entire music-theoretic rule sets (Narmour's implication-realization model, Larson's musical forces, Huron's voice leading rules) within a single unified framework.