Module 5 · Weeks 10–11 · Graduate

Information Theory & Musical Expectation

Harmonic progressions modelled as Markov chains reveal measurable information content. Shannon entropy quantifies how “surprising” each chord transition is—high entropy means high uncertainty, low entropy means the next chord is highly predictable.

Markov Harmony Engine

Transition model:

Root key:

Sequence length: 16

Tempo (BPM): 100

Shannon entropy from I:H = 2.459 bits(max 2.807)

Next-chord probability distribution from I:

10%

iii

25%

35%

10%

vii°

Transition matrix (Bach):

	I	ii	iii	IV	V	vi	vii°
I	5	10	5	25	35	10	10
ii	10	5	5	10	50	10	10
iii	5	5	5	40	5	30	10
IV	15	10	5	5	45	10	10
V	50	5	5	10	5	20	5
vi	10	25	5	25	25	5	5
vii°	45	5	10	5	20	10	5

Mathematical Framework

A first-order Markov chain on chord states\( S = \{I, ii, iii, IV, V, vi, vii^\circ\} \) is determined by a stochastic matrix \( P \) where\( P_{ij} = \Pr(X_{t+1} = j \mid X_t = i) \):

\( P = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{17} \\ p_{21} & p_{22} & \cdots & p_{27} \\ \vdots & & \ddots & \vdots \\ p_{71} & p_{72} & \cdots & p_{77} \end{pmatrix}, \quad \sum_{j=1}^{7} p_{ij} = 1 \)

Stationary distribution: if \( P \) is irreducible and aperiodic, there exists a unique row vector \( \pi \) satisfying\( \pi P = \pi \) and \( \sum_i \pi_i = 1 \). This is the long-run frequency of each chord degree.

Shannon entropy of the transition distribution from state \( i \):

\( H_i = -\sum_{j=1}^{7} p_{ij} \log_2 p_{ij} \)

Low \( H_i \) means state \( i \) is highly predictable—the listener “knows” what comes next. High \( H_i \) signals surprise. The entropy rate of the chain is:

\( \bar{H} = \sum_{i} \pi_i H_i = -\sum_{i} \pi_i \sum_{j} p_{ij} \log_2 p_{ij} \)

This measures the average information per chord transition in the long run. Bach chorales typically have lower entropy rates than jazz standards, reflecting stronger harmonic conventions and tighter voice-leading constraints.

Python Simulation: Shannon Entropy of Harmonic Progressions

Comparing Bach, Romantic, and Jazz transition matrices through the lens of information theory: heatmaps, per-row entropy, and stationary distributions via eigenvalue decomposition.

Python

script.py175 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

fig, axes = plt.subplots(2, 2, figsize=(14, 12))
fig.patch.set_facecolor('#0a0a1a')
for ax in axes.flat:
    ax.set_facecolor('#0d0d24')

DEGREES = ['I', 'ii', 'iii', 'IV', 'V', 'vi', 'vii']

BACH = np.array([
    [0.05, 0.10, 0.05, 0.25, 0.35, 0.10, 0.10],
    [0.10, 0.05, 0.05, 0.10, 0.50, 0.10, 0.10],
    [0.05, 0.05, 0.05, 0.40, 0.05, 0.30, 0.10],
    [0.15, 0.10, 0.05, 0.05, 0.45, 0.10, 0.10],
    [0.50, 0.05, 0.05, 0.10, 0.05, 0.20, 0.05],
    [0.10, 0.25, 0.05, 0.25, 0.25, 0.05, 0.05],
    [0.45, 0.05, 0.10, 0.05, 0.20, 0.10, 0.05],
])

ROMANTIC = np.array([
    [0.05, 0.10, 0.10, 0.20, 0.20, 0.20, 0.15],
    [0.10, 0.05, 0.10, 0.10, 0.35, 0.15, 0.15],
    [0.10, 0.05, 0.05, 0.30, 0.10, 0.25, 0.15],
    [0.20, 0.10, 0.10, 0.05, 0.30, 0.10, 0.15],
    [0.30, 0.05, 0.10, 0.15, 0.05, 0.25, 0.10],
    [0.15, 0.20, 0.10, 0.20, 0.15, 0.05, 0.15],
    [0.30, 0.10, 0.10, 0.10, 0.15, 0.15, 0.10],
])

JAZZ = np.array([
    [0.05, 0.15, 0.10, 0.20, 0.10, 0.25, 0.15],
    [0.05, 0.05, 0.05, 0.05, 0.65, 0.05, 0.10],
    [0.05, 0.05, 0.05, 0.05, 0.05, 0.65, 0.10],
    [0.10, 0.05, 0.10, 0.05, 0.40, 0.10, 0.20],
    [0.45, 0.10, 0.05, 0.10, 0.05, 0.15, 0.10],
    [0.05, 0.45, 0.05, 0.15, 0.10, 0.05, 0.15],
    [0.30, 0.10, 0.15, 0.10, 0.10, 0.10, 0.15],
])

MODELS = {'Bach': BACH, 'Romantic': ROMANTIC, 'Jazz': JAZZ}

def shannon_entropy(row):
    h = 0.0
    for p in row:
        if p > 1e-12:
            h -= p * np.log2(p)
    return h

def stationary_dist(P):
    # Solve pi @ P = pi, sum(pi) = 1 via left eigenvector of P^T
    eigenvalues, eigenvectors = np.linalg.eig(P.T)
    # Find eigenvector for eigenvalue 1
    idx = np.argmin(np.abs(eigenvalues - 1.0))
    pi = np.real(eigenvectors[:, idx])
    pi = np.abs(pi)
    pi = pi / pi.sum()
    return pi

# Build dark colormap
colors_cmap = ['#0d0d24', '#1e1b4b', '#312e81', '#3b82f6', '#93c5fd', '#ffffff']
dark_blue = LinearSegmentedColormap.from_list('dark_blue', colors_cmap)

# --- Plot 1: Bach transition matrix heatmap ---
ax1 = axes[0, 0]
im1 = ax1.imshow(BACH, cmap=dark_blue, vmin=0, vmax=0.65, aspect='auto')
ax1.set_xticks(range(7))
ax1.set_yticks(range(7))
ax1.set_xticklabels(DEGREES, color='#94a3b8', fontsize=9)
ax1.set_yticklabels(DEGREES, color='#94a3b8', fontsize=9)
ax1.set_title('Bach transition matrix', color='#a0c4ff', fontsize=11, pad=8)
ax1.set_xlabel('To', color='#64748b', fontsize=9)
ax1.set_ylabel('From', color='#64748b', fontsize=9)
for i in range(7):
    for j in range(7):
        val = BACH[i, j]
        ax1.text(j, i, str(int(round(val*100))),
                 ha='center', va='center',
                 color='white' if val > 0.2 else '#94a3b8',
                 fontsize=8, fontweight='bold')
cbar1 = fig.colorbar(im1, ax=ax1, shrink=0.85)
plt.setp(cbar1.ax.yaxis.get_ticklabels(), color='#94a3b8', fontsize=7)
h_bach = [shannon_entropy(BACH[i]) for i in range(7)]
ax1.text(3, 7.6, 'Mean H = {:.2f} bits'.format(np.mean(h_bach)),
         ha='center', color='#60a5fa', fontsize=8)

# --- Plot 2: Jazz transition matrix heatmap ---
ax2 = axes[0, 1]
colors_jazz = ['#0d0d24', '#1a2e1a', '#14532d', '#22c55e', '#86efac', '#ffffff']
dark_green = LinearSegmentedColormap.from_list('dark_green', colors_jazz)
im2 = ax2.imshow(JAZZ, cmap=dark_green, vmin=0, vmax=0.65, aspect='auto')
ax2.set_xticks(range(7))
ax2.set_yticks(range(7))
ax2.set_xticklabels(DEGREES, color='#94a3b8', fontsize=9)
ax2.set_yticklabels(DEGREES, color='#94a3b8', fontsize=9)
ax2.set_title('Jazz transition matrix (higher entropy)', color='#a0c4ff', fontsize=11, pad=8)
ax2.set_xlabel('To', color='#64748b', fontsize=9)
ax2.set_ylabel('From', color='#64748b', fontsize=9)
for i in range(7):
    for j in range(7):
        val = JAZZ[i, j]
        ax2.text(j, i, str(int(round(val*100))),
                 ha='center', va='center',
                 color='white' if val > 0.2 else '#94a3b8',
                 fontsize=8, fontweight='bold')
cbar2 = fig.colorbar(im2, ax=ax2, shrink=0.85)
plt.setp(cbar2.ax.yaxis.get_ticklabels(), color='#94a3b8', fontsize=7)
h_jazz = [shannon_entropy(JAZZ[i]) for i in range(7)]
ax2.text(3, 7.6, 'Mean H = {:.2f} bits'.format(np.mean(h_jazz)),
         ha='center', color='#4ade80', fontsize=8)

# --- Plot 3: Shannon entropy per chord degree, all 3 models ---
ax3 = axes[1, 0]
x = np.arange(7)
width = 0.25
h_romantic = [shannon_entropy(ROMANTIC[i]) for i in range(7)]
bars1 = ax3.bar(x - width, h_bach, width, label='Bach', color='#3b82f6', alpha=0.85, edgecolor='#0a0a1a')
bars2 = ax3.bar(x, h_romantic, width, label='Romantic', color='#8b5cf6', alpha=0.85, edgecolor='#0a0a1a')
bars3 = ax3.bar(x + width, h_jazz, width, label='Jazz', color='#22c55e', alpha=0.85, edgecolor='#0a0a1a')
ax3.set_xticks(x)
ax3.set_xticklabels(DEGREES, color='#94a3b8', fontsize=10)
ax3.set_xlabel('Chord degree', color='#64748b', fontsize=9)
ax3.set_ylabel('Shannon entropy H (bits)', color='#64748b', fontsize=9)
ax3.set_title('Row entropy per degree: Bach vs Romantic vs Jazz', color='#a0c4ff', fontsize=11, pad=8)
ax3.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='white', fontsize=9)
ax3.tick_params(colors='#94a3b8', labelsize=9)
for spine in ax3.spines.values():
    spine.set_edgecolor('#334155')
ax3.axhline(y=np.log2(7), color='#ef4444', linewidth=1, linestyle='--', alpha=0.6, label='Max H')
ax3.text(6.5, np.log2(7)+0.03, 'max', color='#ef4444', fontsize=7, ha='right')

# --- Plot 4: Stationary distributions for all 3 models ---
ax4 = axes[1, 1]
pi_bach = stationary_dist(BACH)
pi_romantic = stationary_dist(ROMANTIC)
pi_jazz = stationary_dist(JAZZ)
bars4a = ax4.bar(x - width, pi_bach, width, label='Bach', color='#3b82f6', alpha=0.85, edgecolor='#0a0a1a')
bars4b = ax4.bar(x, pi_romantic, width, label='Romantic', color='#8b5cf6', alpha=0.85, edgecolor='#0a0a1a')
bars4c = ax4.bar(x + width, pi_jazz, width, label='Jazz', color='#22c55e', alpha=0.85, edgecolor='#0a0a1a')
ax4.set_xticks(x)
ax4.set_xticklabels(DEGREES, color='#94a3b8', fontsize=10)
ax4.set_xlabel('Chord degree', color='#64748b', fontsize=9)
ax4.set_ylabel('Stationary probability pi_i', color='#64748b', fontsize=9)
ax4.set_title('Stationary distributions (eigenvalue method)', color='#a0c4ff', fontsize=11, pad=8)
ax4.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='white', fontsize=9)
ax4.tick_params(colors='#94a3b8', labelsize=9)
for spine in ax4.spines.values():
    spine.set_edgecolor('#334155')

plt.suptitle('Shannon Entropy of Harmonic Progressions', color='#e2e8f0',
             fontsize=14, fontweight='bold', y=1.01)
plt.tight_layout(pad=1.5)
plt.savefig('output.png', dpi=150, bbox_inches='tight',
            facecolor='#0a0a1a', edgecolor='none')
plt.close()

print('Shannon Entropy Results:')
print('  Bach mean entropy:     {:.3f} bits'.format(np.mean(h_bach)))
print('  Romantic mean entropy: {:.3f} bits'.format(np.mean(h_romantic)))
print('  Jazz mean entropy:     {:.3f} bits'.format(np.mean(h_jazz)))
print('  Max possible entropy (uniform on 7): {:.3f} bits'.format(np.log2(7)))
print()
print('  Bach stationary distribution:')
print('  ' + '  '.join(d + '={:.3f}'.format(p) for d, p in zip(DEGREES, pi_bach)))
print()
print('  Jazz stationary distribution:')
print('  ' + '  '.join(d + '={:.3f}'.format(p) for d, p in zip(DEGREES, pi_jazz)))
print()
print('  Entropy rate (Bach):     {:.3f} bits/chord'.format(sum(pi_bach[i]*h_bach[i] for i in range(7))))
print('  Entropy rate (Romantic): {:.3f} bits/chord'.format(sum(pi_romantic[i]*h_romantic[i] for i in range(7))))
print('  Entropy rate (Jazz):     {:.3f} bits/chord'.format(sum(pi_jazz[i]*h_jazz[i] for i in range(7))))

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Advanced: Maximum Entropy Stochastic Grammars

Leistikow, R.J. (2006). Bayesian Modeling of Musical Expectations via Maximum Entropy Stochastic Grammars. Stanford University.

The Markov models above use transition matrices learned directly from data. A more principled approach, developed in Leistikow's Stanford dissertation, is to encode music-theoreticrules as parameterized linear constraints on the transition matrix, then find the distribution that maximizes the entropy rate\(H_r = -\sum_{k,l} \mu_k T_{k,l} \log_2 T_{k,l}\) subject to those constraints.

This is a convex optimization problem with guaranteed convergence. The maximum entropy rate distribution encodes everything the rules assert while assuming nothing else — the “maximally noncommittal” distribution. By the asymptotic equipartition property, this maximizes the number of typical musical sequences the model generates, making it as widely applicable as possible.

Multiple rules can be combined by simply concatenating their constraint matrices. A hidden switching state \(R_i\) selects which rule governs each note, and Bayesian inference reveals which rules are activated or violated at each point in a piece — enabling the comparison of entire music-theoretic rule sets (Narmour's implication-realization model, Larson's musical forces, Huron's voice leading rules) within a single unified framework.

Share:X Reddit LinkedIn