Module 7: Song & Acoustic Biophysics

Birdsong is among the most complex acoustic signals produced by any animal. The syrinx — unique to birds — can generate two simultaneous, independently modulated voices. Understanding its biophysics requires fluid dynamics, wave mechanics, and neuroscience. This module derives the physics of sound production, propagation, and the neural circuits for vocal learning.

1. The Syrinx: Two-Voice Sound Source

1.1 Anatomy & Position

Unlike the mammalian larynx (at the top of the trachea), the avian syrinx is located at the tracheo-bronchial junction — where the trachea bifurcates into two bronchi. Most songbirds have a tracheobronchial syrinx with intrinsic musculature in both bronchial branches, enabling independent bilateral control of the two sound sources.

Key structural elements (using oscine/songbird terminology):

MTMMedial tympaniform membranes — thin, compliant membranes on the medial wall of each bronchus. The primary vibrating element. Thickness ∼5–20 μm; area ∼1–5 mm\(^2\).
MLMedial labia — paired fleshy lips adjacent to MTM in some species. Co-vibrate with MTM.
LLLateral labia — lateral valve controlling airflow. Tension affects both amplitude and frequency.
RingsTracheal + bronchial rings — cartilaginous support. The pessulus is a central cartilage at the tracheo-bronchial junction.
MusclesSyringeal muscles (6–9 pairs in oscines) — nXIIts (hypoglossal nerve) innervation. Control membrane tension, glottal aperture, and labial position.

1.2 Sound Production: Bernoulli & Membrane Vibration

Air flow from the air sacs through the bronchi past the MTM/labia follows the Bernoulli principle: in the narrow channel between the membrane and the bronchial wall, flow velocity increases and pressure decreases (the Bernoulli effect), drawing the membranes inward. The restoring elastic force then pushes them back outward, establishing oscillation.

The fundamental frequency of the membrane oscillation is determined by its tension \(T\), area density \(\rho_s\) (mass per unit area), and effective length scale \(L\). For a thin membrane under tension, the resonant frequency:

\[ f_0 = \frac{1}{2L}\sqrt{\frac{T}{\rho_s}} \]

This is analogous to a string under tension, but in 2D. \(T\) is controlled by syringeal muscle activity; \(\rho_s\) is set by membrane thickness and material properties (elastic modulus \(E \sim 10\text{--}100\) kPa for biological membranes).

Frequency modulation during song occurs on timescales of milliseconds. The rate of frequency change depends on muscle force: for a finch producing a frequency sweep from 3–8 kHz over 50 ms:

\[ \frac{df_0}{dt} = \frac{1}{2L\sqrt{\rho_s}} \cdot \frac{1}{2\sqrt{T}} \cdot \frac{dT}{dt} = \frac{f_0}{2T} \frac{dT}{dt} \approx 100\ \text{kHz/s} \]

The required rate of tension change is \(dT/dt \approx 2T f_0^{-1} \times 10^5\) — achievable by fast-twitch syringeal muscles with contraction times ∼5–10 ms.

1.3 Dual Voice: Two Independent Sources

The most striking capability of the oscine syrinx is the production of two simultaneous, harmonically unrelated frequencies. Wood thrushes (Hylocichla mustelina) produce complex chords by using each bronchial half of the syrinx independently. The two sides can operate:

● At different fundamental frequencies (independent pitch control)
● With different amplitude envelopes (independent volume)
● With independent spectral profiles (different harmonic content per side)
● With one side producing while the other is silent (unilateral phonation)

Nerve cutting experiments (severing the right or left n.XII branch) silence the corresponding side, confirming independent neural control. The left side tends to dominate in many species (left-hemisphere lateralization for song motor control — analogous to language lateralization in humans).

Figure 1: Syrinx Cross-Section Diagram

2. Source-Filter Theory & Tracheal Resonance

2.1 Source: Syrinx Harmonic Series

The vibrating membrane generates a harmonic series: the fundamental frequency \(f_0\) plus overtones at \(2f_0, 3f_0, \ldots, nf_0\). For a purely sinusoidal (single-frequency) vibration, only \(f_0\) is produced. The nonlinearity of membrane vibration under Bernoulli forcing generates harmonic distortion, rich in higher partials. The spectrum of the source signal:

\[ s(t) = \sum_{n=1}^{N} a_n \sin(2\pi n f_0 t + \phi_n) \]

Amplitude envelope \(a_n\) typically falls off as \(a_n \propto n^{-\alpha}\)(spectral roll-off), with \(\alpha \approx 1\text{--}2\) for voiced sounds.

2.2 Filter: Tracheal Tube Resonances (Formants)

The trachea acts as a quarter-wave resonator: open at the beak and approximately closed at the syrinx. Resonance condition:

\[ L = \frac{(2n-1)\lambda}{4} \implies f_n = \frac{(2n-1)c_{\text{sound}}}{4L} \]

\(c_{\text{sound}} \approx 350\) m/s at 37°C (inside trachea, warm humid air),\(L\) = tracheal length, \(n = 1, 2, 3, \ldots\) gives formants at odd multiples. For a zebra finch (\(L \approx 15\) mm):\(f_1 = 350/(4 \times 0.015) \approx 5833\) Hz — in the middle of their song range (2–9 kHz).

The beak acts as a radiation filter. A closed beak attenuates high frequencies (low-pass), while an open beak emphasizes higher harmonics. Dynamic beak opening during song modifies the effective tube length and hence formant positions:

\[ \Delta f_{\text{formant}} \approx \frac{c}{4L^2} \Delta L_{\text{effective}} \]

Air sacs (connected to the syrinx) add parallel resonant cavities, further shaping the spectral output. The complete filter transfer function \(H(f)\) is the product of tracheal, beak, and air sac transfer functions. The output song spectrum:\(P_{\text{out}}(f) = |H(f)|^2 \cdot P_{\text{source}}(f)\).

Figure 2: Schematic Spectrogram of Bird Song

3. Vocal Learning: Neural Circuits & FOXP2

Vocal learning — acquiring vocalizations through auditory experience and practice — is taxonomically rare: among birds it is found only in three independent lineages (oscine passerines/songbirds, parrots, hummingbirds), plus cetaceans, bats, elephants, seals, and humans among mammals. This convergent evolution suggests strong selective pressure and shared neurobiological mechanisms.

3.1 Song System Neural Circuits

Motor Pathway (Song Production)

HVC(formerly High Vocal Centre) — timing nucleus, projects to RA and Area X. Contains time-locked neurons firing at precise moments during song. Lesion: song disrupted.

RA(Robust nucleus of Arcopallium) — receives from HVC, projects to nXIIts (syrinx motor neurons) and DM (respiratory). Lesion: song abolished.

nXIIts(Tracheosyringeal hypoglossal nerve nucleus) — final motor output to syringeal muscles. Direct control of membrane tension and glottal aperture.

Anterior Forebrain Pathway (AFP, Learning)

Area XStriatal nucleus (basal ganglia analogue). Receives from HVC. Critical for song learning, not production. Contains dopaminergic reward signals.

DLM(Medial nucleus of dorsolateral thalamus) — thalamic relay. Connects Area X to LMAN.

LMAN(Lateral Magnocellular Nucleus of Anterior Nidopallium) — cortical analogue. Projects to RA. Introduces variability (exploration) during learning. Lesion in adults: no effect; lesion in juveniles: crystallization blocked.

During sensory phase (∼10–65 days post-hatch in zebra finch), the juvenile memorizes the tutor song into an auditory template in HVC and LMAN. During sensorimotor phase (∼25–90 dpn), the bird practices singing, comparing its output to the template via error-correcting plasticity in the AFP circuit (reinforcement learning). Dopaminergic projections from VTA to Area X signal reward and guide synaptic modification (BDNF/NTRK2-dependent LTP in RA).

3.2 FOXP2: The Language Gene

FOXP2 (Forkhead Box P2) is a transcription factor containing a forkhead DNA-binding domain. It is expressed in HVC and Area X in songbirds, and its expression level changes seasonally (higher during song learning periods). Key findings:

●Knockdown of FOXP2 in Area X of zebra finch during sensorimotor learning produces abnormal, inaccurate song that fails to match the tutor template.
●The human FOXP2 sequence differs from mouse at only 3 amino acids, but two of these are specific to the human lineage and arose after the human-chimpanzee split (∼6 Mya).
●Mutations in human FOXP2 cause a severe speech and language disorder (verbal dyspraxia), establishing it as critical for the fine oral-motor control underlying speech.
●FOXP2 regulates downstream targets including CNTNAP2 (axon guidance), SLIT1 (neuronal migration), and MAP1B (synaptic plasticity) in both birds and humans.
●Humanized mice (carrying human FOXP2 amino acid substitutions) show altered ultrasonic vocalizations and enhanced basal ganglia synaptic plasticity — suggesting FOXP2 drove changes in motor learning circuitry.

4. Python: FM Synthesis, Spectrogram & Tracheal Filtering

Generating synthetic bird song using frequency modulation (FM) synthesis, computing a spectrogram, and demonstrating tracheal formant filtering on the harmonic source spectrum.

Python

script.py156 lines

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from scipy.signal import spectrogram as scipy_spectrogram

fig = plt.figure(figsize=(14, 12))
gs = gridspec.GridSpec(3, 2, figure=fig, hspace=0.50, wspace=0.38)

SR = 44100   # sample rate Hz
dt = 1.0 / SR

# ============================================================
# Panel A: FM-synthesized syllable waveform
# ============================================================
ax1 = fig.add_subplot(gs[0, :])

def fm_syllable(t_start, duration, f_carrier, f_mod, mod_depth, amplitude=0.8, sr=SR):
    n_samples = int(duration * sr)
    t = np.arange(n_samples) / sr
    # FM synthesis: x(t) = A * sin(2*pi*fc*t + mod_depth * sin(2*pi*fm*t))
    phase = 2 * np.pi * f_carrier * t + mod_depth * np.sin(2 * np.pi * f_mod * t)
    signal = amplitude * np.sin(phase)
    # Apply Hann envelope
    env = np.hanning(n_samples)
    return signal * env

# Build a song sequence: 6 syllables
total_dur  = 1.2   # seconds
total_samp = int(total_dur * SR)
song       = np.zeros(total_samp)

syllables = [
    # (start_s, dur_s, f_carrier, f_mod, mod_depth, amp)
    (0.05,  0.12,  3500, 20, 8,   0.9),   # A: upsweep (mod_depth drives freq rise)
    (0.22,  0.10,  6000, 15, -6,  0.8),   # B: downsweep
    (0.38,  0.15,  4200, 10, 0,   0.7),   # C: pure tone harmonic
    (0.60,  0.04,  5000, 80, 3,   0.75),  # D: trill element 1
    (0.66,  0.04,  5200, 80, 3,   0.75),  # D: trill element 2
    (0.72,  0.04,  5400, 80, 3,   0.70),  # D: trill element 3
    (0.85,  0.18,  3800, 12, 2,   0.8),   # E: two-voice simulation (low)
    (0.85,  0.18,  7200, 8,  1.5, 0.5),   # E: two-voice (high voice)
]

for (t0, dur, fc, fm, md, amp) in syllables:
    syl = fm_syllable(t0, dur, fc, fm, md, amp)
    i0  = int(t0 * SR)
    song[i0:i0+len(syl)] += syl

# Add slight noise
song += 0.015 * np.random.randn(total_samp)

t_axis = np.arange(total_samp) / SR
ax1.plot(t_axis[:SR//5], song[:SR//5], color='#38bdf8', linewidth=0.7, alpha=0.85)
ax1.set_xlabel('Time (s)', fontsize=10)
ax1.set_ylabel('Amplitude', fontsize=10)
ax1.set_title('FM-Synthesized Bird Song Waveform (first 0.24 s)', fontsize=11)
ax1.grid(True, alpha=0.2)

# ============================================================
# Panel B: Spectrogram
# ============================================================
ax2 = fig.add_subplot(gs[1, :])

f_spec, t_spec, Sxx = scipy_spectrogram(song, fs=SR, window='hann',
                                         nperseg=1024, noverlap=896,
                                         nfft=2048)
# Show 0-12 kHz
fmask = f_spec < 12000
Sxx_db = 10 * np.log10(Sxx[fmask, :] + 1e-12)
Sxx_db_norm = Sxx_db - Sxx_db.max()

im = ax2.pcolormesh(t_spec, f_spec[fmask]/1000, Sxx_db_norm,
                    shading='gouraud', cmap='inferno', vmin=-50, vmax=0)
plt.colorbar(im, ax=ax2, label='Power (dB, normalized)')
ax2.set_xlabel('Time (s)', fontsize=10)
ax2.set_ylabel('Frequency (kHz)', fontsize=10)
ax2.set_title('Spectrogram of Synthetic Bird Song', fontsize=11)
ax2.set_xlim(0, total_dur)
ax2.set_ylim(0, 12)

# Mark syllable positions
syl_labels = ['A','B','C','D1','D2','D3','E']
for i, (t0, dur, fc, *rest) in enumerate(syllables[:7]):
    ax2.annotate(syl_labels[i], xy=(t0 + dur/2, fc/1000 + 0.8),
                fontsize=8, color='white', ha='center',
                bbox=dict(boxstyle='round,pad=0.2', facecolor='#1e293b', alpha=0.7))

# ============================================================
# Panel C: Harmonic source spectrum
# ============================================================
ax3 = fig.add_subplot(gs[2, 0])

f0 = 3500.0   # fundamental Hz
N  = 20
n_harmonics = np.arange(1, N+1)
f_harmonics = n_harmonics * f0
# Source: roll-off ~ n^(-1.5)
source_amp = n_harmonics ** (-1.5)

ax3.stem(f_harmonics/1000, source_amp, linefmt='#38bdf8', markerfmt='o', basefmt=' ')
ax3.set_xlabel('Frequency (kHz)', fontsize=10)
ax3.set_ylabel('Relative Amplitude', fontsize=10)
ax3.set_title(f'Syrinx Source Spectrum\nf0={f0:.0f} Hz, roll-off n^(-1.5)', fontsize=10)
ax3.set_xlim(0, 25)
ax3.grid(True, alpha=0.25)

# ============================================================
# Panel D: Tracheal filter + filtered output
# ============================================================
ax4 = fig.add_subplot(gs[2, 1])

c_sound = 350.0    # m/s inside trachea (37 C, humid)
L_trachea = 0.015  # 15 mm for small songbird

f_cont = np.linspace(100, 25000, 5000)
# Quarter-wave tube transfer function (magnitude)
# H(f) = 1/cos(pi*f/(2*f1)) where f1 = c/(4L) is first resonance
f1 = c_sound / (4 * L_trachea)

# Tracheal filter (peaks at odd multiples of f1)
# Approximation: |H|^2 = 1 / (1 - R^2*cos^2(k*L)) with R~0.6 (radiation loss)
k_L = np.pi * f_cont / (2 * f1)
R_rad = 0.65
H_trachea = 1.0 / np.sqrt(1 - (R_rad * np.cos(k_L))**2 + 1e-6)
H_trachea /= H_trachea.max()

# Source spectrum (continuous)
source_cont = (f_cont / f0) ** (-1.5)
source_cont[f_cont < f0] = 0
source_cont /= source_cont.max()

filtered = source_cont * H_trachea
filtered /= filtered.max()

ax4.plot(f_cont/1000, source_cont, color='#94a3b8', linewidth=1.5, alpha=0.7, label='Source (syrinx)')
ax4.plot(f_cont/1000, H_trachea,   color='#38bdf8', linewidth=1.5, alpha=0.8, label='Tracheal filter |H(f)|', linestyle='--')
ax4.plot(f_cont/1000, filtered,    color='#4ade80', linewidth=2.0, label='Filtered output')

# Mark formants
for n in range(1, 6):
    fn = (2*n - 1) * f1
    if fn < 25000:
        ax4.axvline(fn/1000, color='#f59e0b', linewidth=0.8, linestyle=':', alpha=0.6)
        ax4.text(fn/1000, 1.05, f'F{n}', fontsize=7.5, color='#fbbf24', ha='center')

ax4.set_xlabel('Frequency (kHz)', fontsize=10)
ax4.set_ylabel('Relative Power', fontsize=10)
ax4.set_title(f'Source-Filter Model\nTrachea L={L_trachea*1000:.0f}mm, F1={f1:.0f}Hz', fontsize=10)
ax4.legend(fontsize=8, loc='upper right')
ax4.set_xlim(0, 25)
ax4.grid(True, alpha=0.25)
ax4.set_ylim(0, 1.15)

fig.suptitle('Avian Acoustic Biophysics: FM Synthesis, Spectrogram & Source-Filter Model', fontsize=13, fontweight='bold')
plt.savefig('output.png', dpi=130, bbox_inches='tight', facecolor='#0f172a')

Click Run to execute the Python code

Code will be executed with Python 3 on the server

References

Suthers, R. A. (2004). How birds sing and why it matters. In Nature's Music: The Science of Birdsong (ed. P. Marler & H. Slabbekoorn), pp. 272–295. Academic Press.
Elemans, C. P. H. et al. (2015). Universal mechanisms of sound production and control in birds and mammals. Nature Communications, 6, 8978.
Goller, F. & Larsen, O. N. (1997). A new mechanism of sound generation in songbirds. Proceedings of the National Academy of Sciences, 94, 14787–14791.
Nowicki, S. & Searcy, W. A. (2014). The evolution of vocal learning. Current Opinion in Neurobiology, 28, 48–53.
Fee, M. S. & Scharff, C. (2010). The songbird as a model for the generation and learning of complex sequential behaviors. ILAR Journal, 51, 362–377.
Catchpole, C. K. & Slater, P. J. B. (2008). Bird Song: Biological Themes and Variations, 2nd ed. Cambridge University Press.
Gill, F. B. (2007). Ornithology, 3rd ed. W. H. Freeman.

←Egg Biochemistry Migration & Endurance→

Share:X Reddit LinkedIn