Mathematics of Music/Neuroscience of Music Perception

Neuroscience of Music Perception

How does a pressure wave in air become a Beethoven symphony in your mind? This module traces the complete neural pathway from the outer ear to the motor system, explores the brain as a prediction machine that rewards itself for anticipating musical events, and reveals why music triggers dopamine, chills, and tears.

1The Auditory Pathway

Sound travels through eight distinct processing stages before it becomes a conscious musical experience. Each stage performs a specific mathematical transformation. Click any card to expand its full detail.

1Outer Ear2Middle Ear3Cochlea4Auditory Nerve5Brainstem6Auditory Cortex7Limbic System8Motor SystemSound to Perception: 8 Stages of Auditory Processing

2The Basilar Membrane

The cochlea performs a biological Fourier transform. The basilar membrane maps frequency to position via the Greenwood function:

\( f(x) = 165.4\bigl(10^{0.06\,x} - 0.88\bigr) \)

where x is the distance from the apex in mm (0 = apex/low freq, 35 = base/high freq)

Click a note or chord below to hear it and see the excitation pattern on the unrolled membrane. Notice how closer frequencies create overlapping patterns (roughness/dissonance).

Base (high freq)Apex (low freq)20k Hz10k Hz5k Hz2k Hz1k Hz500 Hz200 HzClick a note or chord to see the excitation pattern

Single Notes (C4 to C5)

Chords (multi-peak excitation)

Key insight: When two frequencies are close together (like C4 and C#4), their excitation patterns overlap on the basilar membrane. This overlap causes the sensation of roughness or beating, which the brain interprets as dissonance. The critical bandwidth (roughly a minor third in the mid range) defines the distance below which two tones interfere destructively.

3The Predictive Brain

The brain is not a passive receiver of sound. Following the framework of Karl Friston (free energy principle) and rooted in Helmholtz's idea of "unconscious inference," the auditory cortex is a prediction machine that constantly generates expectations about the next musical event and forwards only the prediction error to higher levels.

Core Equations of Predictive Coding

Prediction error: \( \varepsilon = s - \mu \) where \( s \) is the sensory input and \( \mu \) is the top-down prediction.

Free energy (to be minimised): \( F = \sum_i \frac{\varepsilon_i^2}{2\sigma_i^2} + \ln \sigma_i \)

Dopamine response: \( \Delta D \propto |\varepsilon| \times S \) where \( S \) is salience. But crucially, surprise must be learnable to be rewarding.

Click each harmonic event below to hear the chord transition and see the prediction error and reward dynamics. Notice: maximum reward comes not from zero surprise (boring) or maximum surprise (noise), but from the sweet spot where surprise is high but comprehensible.

The Wundt Curve: Berlyne (1971) proposed an inverted-U relationship between stimulus complexity and hedonic value. Too simple = boring, too complex = aversive, optimal complexity = maximally rewarding. The predictive coding framework provides the neural mechanism: reward peaks when prediction error is large enough to trigger dopamine but small enough to be rapidly resolved by model updating.

4Emotion & Reward

Music is one of the most potent activators of the brain reward system. These four findings reveal why a sequence of pressure waves can make you cry, dance, or shiver.

Music & Language: Shared Neural Architecture

Music and language are the two most complex auditory abilities unique to humans. They share surprising neural overlap — and illuminating differences.

LANGUAGEBroca area (syntax)Wernicke (semantics)Left planum temporalePhonological processingMUSICRight auditory cortex (pitch)Right planum temporaleTonal contour processingHarmonic structureSHAREDAuditory cortexWorking memorySyntax processingBasal ganglia (timing)Both domains require hierarchical structure, temporal prediction, and emotional processing

Shared Syntactic Integration Resource Hypothesis (SSIRH)

Patel (2003) proposed that music and language share syntactic processing resources in the inferior frontal gyrus (Broca area). Musically unexpected chords (e.g., Neapolitan sixth in C major) cause the same ERP component (ERAN — Early Right Anterior Negativity) as syntactically unexpected words. Musical training enhances this shared resource, improving grammatical processing in both domains.

The Musicophilia Spectrum

Amusia (tone-deafness) affects ~4% of the population. Congenital amusics cannot distinguish pitch differences smaller than 2 semitones. Remarkably, their language prosody is often preserved — suggesting partially independent pitch-processing pathways. On the other end, absolute pitch (~1 in 10,000) involves categorical pitch perception: each frequency maps to a named note, as effortlessly as colour maps to a name. Both conditions are partially genetic.

Musical Training Effects on the Brain

Professional musicians are the neuroscientist's favourite subjects. Years of intensive practice produce measurable structural and functional brain changes:

Corpus Callosum

10-15% larger anterior section

Bimanual coordination requires massive interhemispheric communication

Schlaug et al., 1995

Auditory Cortex

130% more grey matter in Heschl gyrus

Thousands of hours of fine-grained pitch discrimination

Schneider et al., 2002

Motor Cortex

Expanded hand representation (homunculus)

Fine motor skills: a pianist executes ~1,800 notes/minute in a Liszt etude

Elbert et al., 1995

Cerebellum

5% larger volume

Rhythm, timing, and motor sequence coordination

Hutchinson et al., 2003

Planum Temporale

Stronger leftward asymmetry

Enhanced frequency discrimination and absolute pitch

Keenan et al., 2001

Arcuate Fasciculus

Larger and more myelinated

Stronger connection between auditory and motor regions

Halwani et al., 2011

Hippocampus

Larger volume in older musicians

Music as cognitive reserve against age-related decline

Hanna-Pladdy & MacKay, 2011

Prefrontal Cortex

Enhanced executive function networks

Working memory, attentional control, and planning during performance

Zuk et al., 2014

Broca Area

Enhanced syntactic processing

Hierarchical structure processing shared with language (OPERA hypothesis)

Patel, 2011

Critical Period & Sensitive Period

Musicians who begin training before age 7 show significantly larger structural brain changes than those who start later — even when total years of training are matched. This suggests a sensitive period for music-driven neuroplasticity, analogous to the critical period for language acquisition (Lenneberg, 1967). The sensitive period is not absolute: adults can still learn music and show brain changes, but the magnitude is smaller.

Absolute pitch provides the clearest evidence: virtually all absolute pitch possessors began training before age 6. After age 12, absolute pitch acquisition becomes extremely rare regardless of training intensity. The genetic component (familial clustering) interacts with early exposure — genes load the gun, but early training pulls the trigger.

The Wundt Curve: Optimal Complexity

The German psychologist Wilhelm Wundt (1874) proposed an inverted-U relationship between stimulus complexity and hedonic value. Applied to music (Berlyne, 1971): pieces that are too simple (low information content) are boring; pieces that are too complex (high information content) are perceived as noise. Maximum pleasure occurs at an intermediate complexity level that matches the listener's current model — complex enough to generate prediction errors, but structured enough for those errors to be learnable.

OPTIMALStimulus Complexity / Information ContentPleasure / Hedonic ValueBORING(too predictable)NOISE(too unpredictable)sweet spotnursery rhymeMozartBeethoven lateStravinskyfree noise

Crucially, the optimal point shifts with expertise. A trained musician finds a Mozart sonata too predictable (understimulating), while a Schoenberg piece that seems like noise to a novice reveals its structure after study. The Wundt curve is not fixed — it moves rightward with musical training, as the brain's predictive model becomes more sophisticated and can extract regularity from increasingly complex stimuli.

This connects directly to the information theory of Module 5: the optimal point on the Wundt curve corresponds to the complexity level that maximizes \(\Delta D \propto \text{prediction error} \times \text{learnability}\). Shannon entropy alone is insufficient — random noise has maximum entropy but zero musical value. The key is structured complexity: high entropy at the surface, but with learnable deep structure.

Music Therapy: Clinical Applications

The neuroscience of music perception has direct clinical applications. Music activates distributed brain networks — auditory, motor, emotional, memory — making it a uniquely powerful therapeutic tool:

Parkinson Disease

Rhythmic Auditory Stimulation (RAS)

External beat bypasses damaged basal ganglia timing circuits. Gait velocity improves 10-15%, stride length increases 12%.

Thaut et al., 1996; McIntosh et al., 1997

Stroke Aphasia

Melodic Intonation Therapy (MIT)

Singing activates right-hemisphere homologues of left-hemisphere language areas. Patients who cannot speak can often sing, then gradually shift from singing to speaking.

Schlaug et al., 2010

Alzheimer Disease

Familiar Music Listening

Musical memories are stored in medial prefrontal cortex and cerebellum, which are among the last regions affected by Alzheimer. Patients in late stages who cannot recognize family members can still sing songs from their youth.

Jacobsen et al., 2015

Depression & Anxiety

Active Music-Making / Guided Imagery

Music modulates cortisol, oxytocin, and serotonin. Group drumming reduces cortisol by 28% and increases natural killer cell activity. Listening to self-selected music reduces anxiety scores by 65% in pre-surgical patients.

Bittman et al., 2001; Nilsson, 2008

Chronic Pain

Music-Assisted Relaxation

Music activates descending pain-inhibition pathways via the periaqueductal gray. Preferred music reduces pain ratings by 21% and opioid consumption by 38% post-surgery.

Hole et al., 2015 (Cochrane review)

Autism Spectrum

Improvisational Music Therapy

Musical interaction provides a non-verbal communication channel. Shared rhythm and turn-taking in musical improvisation develop social reciprocity, joint attention, and emotional expression.

Geretsegger et al., 2014 (Cochrane review)

Python: Auditory Perception Models

This simulation models the Greenwood tonotopic function, the Wundt curve, and auditory nerve firing patterns for different musical stimuli.

Python
script.py120 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

The Bayesian Brain & Music

The most powerful framework for understanding music perception is the Bayesian brain hypothesis: the brain is not a passive receiver of sensory input but an active prediction machine that continuously generates probabilistic models of the world and updates them when predictions fail.

Bayes' Theorem as the Brain's Operating System

At every moment, the auditory cortex maintains a prior distribution over what it expects to hear next, based on everything it has learned about musical structure. When a new sound arrives, the brain computes the posterior via Bayes' rule:

\( P(\text{hypothesis} \mid \text{data}) = \frac{P(\text{data} \mid \text{hypothesis}) \cdot P(\text{hypothesis})}{P(\text{data})} \)

posterior = likelihood × prior / evidence

In musical terms: the prior is what you expect (the next chord in a progression), the likelihood is how well the actual sound matches that expectation, and the posterior is your updated belief after hearing it. The prediction error\(\varepsilon = s - \mu\) drives learning and emotional response.

Predictive Coding Hierarchy

Karl Friston's free energy principle extends Bayesian inference to a hierarchical predictive coding framework. The brain minimizes variational free energy \(F \geq -\ln P(\text{data})\) — an upper bound on surprise. Applied to music:

Level 3: Musical form(sonata, ABA, verse-chorus)5-60 secondspredictions downerrors upLevel 2: Harmonic context(key, chord progression)200 ms - 5 spredictions downerrors upLevel 1: Acoustic features(pitch, timbre, onset)10-50 ms

Higher levels send top-down predictions to lower levels; lower levels send bottom-up prediction errors to higher levels. Music manipulates this hierarchy: a deceptive cadence generates a prediction error at Level 2 (harmonic) while Level 3 (form) may have predicted the surprise. The interaction between levels creates the rich emotional landscape of musical experience.

Statistical Learning in Music

Infants as young as 8 months can extract statistical regularities from tone sequences after just 2 minutes of exposure (Saffran et al., 1999). The brain builds implicit probabilistic models of musical structure without conscious effort. By age 5, children have internalized the basic harmonic grammar of their culture. This implicit statistical learning is the mechanism behind the priors that drive Bayesian inference in adult music listening.

Information-Theoretic Surprise

The Bayesian framework connects directly to Shannon information theory. The surprisal of an event is\(I(x) = -\log_2 P(x)\) bits — the information content of hearing chord \(x\) given your current model. A V→I cadence carries ~0.5 bits (highly expected); a chromatic mediant carries ~3-4 bits (surprising but learnable); a random atonal cluster carries ~6+ bits (noise). The emotional response maps onto this information-theoretic quantity via the dopamine system.

Huron's ITPRA Theory

David Huron's ITPRA model (2006) decomposes the emotional response to a musical event into five temporally ordered components:

I
Imagination
Before
T
Tension
Just before
P
Prediction
At event
R
Reaction
Immediate
A
Appraisal
After

The P (Prediction) component is the Bayesian core: the brain rewards itself for correct predictions and generates a penalty signal for prediction errors. But the R (Reaction) component is pre-cognitive — the brainstem responds to sudden loud sounds or dissonance before the cortex has time to analyze them. The full emotional response to music is the sum of all five components, unfolding over 0.1 to 5 seconds.

Bayesian Modeling of Musical Expectations

Based on Leistikow, R.J. (2006). Bayesian Modeling of Musical Expectations via Maximum Entropy Stochastic Grammars. Ph.D. dissertation, Stanford University. Advisor: Jonathan Berger.

The Leistikow dissertation presents a rigorous computational framework for modeling how listeners form, update, and violate musical expectations — using dynamic Bayesian networks and maximum entropy distributions. The core insight: musical style can be encoded as a set of parameterized rules (e.g., “a large upward interval tends to be followed by a smaller downward interval”), and themaximum entropy rate distribution satisfying those rules is the uniquely correct choice for inference, because it encodes everything known while carefully avoiding any unintended bias.

Dynamic Bayesian Networks for Melody

A melody is modeled as a first-order Markov chain of notes \(N_1, N_2, \ldots, N_K\), where each note depends on its predecessor. The joint distribution factors as\(P(N_{1:K}) = P(N_1)\prod_{i=2}^{K} P(N_i \mid N_{i-1})\). Adding a hidden state \(S_i\) (musical mode, active rule, harmony) creates anautoregressive hidden Markov model (AR-HMM) that fuses bottom-up (data-driven) and top-down (schema-driven) processes — exactly matching the dual-process architecture proposed by Narmour and validated by Krumhansl.

Maximum Entropy Rate Principle

Music theory rules are inherently incomplete — they say “should” and “tends to” but never give exact probabilities. The solution: encode each rule as a linear constraint on the transition matrix, then maximize the entropy rate\(H_r = -\sum_{k,l} \mu_k T_{k,l} \log_2 T_{k,l}\)subject to those constraints. This yields the distribution that is “as uniform as possible given the rules” — encoding everything known while assuming nothing else. The AEP guarantees this maximizes the number of typical musical sequences.

Surprisal & Information-Theoretic Listening

At each note, the system computes a predictive distribution\(P(N_{i+1} \mid n_{1:i})\) and, after observing \(n_{i+1}\), measures thesurprisal: \(-\log_2 P(n_{i+1} \mid n_{1:i})\) bits. High surprisal = unexpected note = strong emotional response. The entropy of the predictive distribution measures the uncertainty of the expectation. The “Shave and a Haircut” example in the dissertation shows how F# following G generates 6.6 bits of surprise, while the expected G following F# generates only 0.7 bits.

Inferring Rule Activation & Violation

The hidden state can be a switching variable selecting which rule governs each note transition. Bayesian filtering computes \(P(R_i \mid x_{1:i})\) — the posterior probability of each rule being active at time \(i\). This reveals which musical “forces” (gravity, magnetism, inertia) are responsible for each note, and identifies moments of surprise as rule violations. The dissertation encodes Larson's musical forces: gravity (notes above stable pitches descend), magnetism (unstable notes resolve to nearest stable pitch, with inverse-square distance), and inertia (melodies continue in the same direction).

The Hierarchical Model: Harmony, Meter & Beat Position

Chapter 7 of the dissertation extends the basic model to include hidden variables formeter \(L_i\),beat position \(B_i\),harmony \(H_i\), andnote duration \(D_i\). Two key musical tendencies are encoded:

  1. 1. Chord changes occur more frequently on strong beats than on weak beats
  2. 2. Notes on strong beats are more likely to be chord members than notes on weak beats

Bayesian inference inverts these generative relationships: from the sequence of observed notes, the system infers harmony, meter, and beat position simultaneously. Applied to Bach's Fugue in A minor (BWV 543), the system demonstrates “foot-tapping” — gradually locking onto the correct beat position as evidence accumulates, then retrospectively sharpening its earlier estimates via backward smoothing.

Fusing Symbolic & Signal Layers

Chapter 8 shows how the symbolic expectation models can be seamlessly integrated with audio signal processing. The signal layer extracts STFT peaks and segments the audio into note events; the symbolic layer encodes musical expectations about note transitions. At “note activation frames” (detected onsets), the full expectation hierarchy is activated. Between onsets, all symbolic variables are “latched” — memorizing their values. This creates a system where musical knowledge improves signal processing (resolving octave ambiguities, for example) andsignal features inform musical inference (aspects of performance practice not present in any symbolic score).

Python: Bayesian Musical Expectations

This simulation implements the core framework from Leistikow (2006): a first-order Markov model of melody, surprisal computation at each note, maximum entropy rate transition distributions under musical constraints, and a comparison of rule-constrained vs data-driven expectations.

Python
script.py184 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

References

  • Salimpoor, V. N. et al. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience, 14(2), 257-262.
  • Blood, A. J. et al. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience, 2(4), 382-387.
  • Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology, 2, 142.
  • Huron, D. (2011). Why is sad music pleasurable? A possible role for prolactin. Musicae Scientiae, 15(2), 146-158.
  • Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
  • Greenwood, D. D. (1990). A cochlear frequency-position function for several species. JASA, 87(6), 2592-2605.
  • Zatorre, R. J. & Salimpoor, V. N. (2013). From perception to pleasure: Music and its neural substrates. PNAS, 110(Supplement 2), 10430-10437.
  • Berlyne, D. E. (1971). Aesthetics and Psychobiology. Appleton-Century-Crofts.
  • Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative and Physiological Psychology, 41(1), 35-39.
  • Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience, 15(3), 170-180.
  • Patel, A. D. (2003). Language, music, syntax, and the brain. Nature Neuroscience, 6(7), 674-681.
  • Schlaug, G. et al. (1995). Increased corpus callosum size in musicians. Neuropsychologia, 33(8), 1047-1055.
  • Saffran, J. R. et al. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27-52.
  • Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. MIT Press.
  • Thaut, M. H. et al. (1996). Rhythmic auditory stimulation in gait training for Parkinson disease patients. Movement Disorders, 11(2), 193-200.
  • Schlaug, G. et al. (2010). From singing to speaking: facilitating recovery from nonfluent aphasia. Future Neurology, 5(5), 657-665.
  • Jacobsen, J. H. et al. (2015). Why musical memory can be preserved in advanced Alzheimer disease. Brain, 138(8), 2438-2450.
  • Leistikow, R. J. (2006). Bayesian Modeling of Musical Expectations via Maximum Entropy Stochastic Grammars. Ph.D. dissertation, Stanford University. Advisor: Jonathan Berger.
  • Meyer, L. B. (1956). Music, The Arts, and Ideas. Chicago: University of Chicago Press.
  • Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. Chicago: University of Chicago Press.
  • Larson, S. (2004). Musical forces and melodic expectation. Music Perception, 21(4), 457-498.