Biophysics of Hearing
Sound wave physics, basilar membrane mechanics, hair cell transduction, the cochlear amplifier, and otoacoustic emissions — the remarkable biophysics of auditory perception
Why Hearing Biophysics?
The human ear is an astonishing transducer: it detects sound pressure variations as small as 20 micropascals (the threshold of hearing), corresponding to eardrum displacements of roughly 10 picometres — less than the diameter of a hydrogen atom. It does this over a frequency range of 20 Hz to 20 kHz and a dynamic range of 120 dB (a factor of 10$^{12}$ in intensity). Understanding how this is achieved requires physics spanning acoustics, fluid mechanics, nonlinear dynamics, and molecular biophysics.
This chapter derives the wave equation for sound, the mechanics of the basilar membrane's tonotopic map, hair cell mechanotransduction, the active cochlear amplifier, and the phenomenon of otoacoustic emissions — sound generated by the ear itself.
1. Sound Wave Physics
Sound is a longitudinal pressure wave propagating through a compressible medium. We derive the fundamental wave equation from first principles using Newton's second law and the equation of state for the medium.
Derivation: The Acoustic Wave Equation
Consider a fluid element of cross-sectional area $A$ and thickness $dx$at equilibrium position $x$. Let $\xi(x,t)$ be the displacement from equilibrium. The pressure at any point is $P = P_0 + p(x,t)$ where $p$is the acoustic pressure perturbation.
Step 1 — Conservation of mass (continuity): The fractional change in volume of the element is:
$$\frac{\Delta V}{V} = \frac{\partial \xi}{\partial x}$$
For an adiabatic process in an ideal gas, $PV^\gamma = \text{const}$, giving the bulk modulus $B = -V \, dP/dV = \gamma P_0$. The acoustic pressure relates to compression as:
$$p = -B \frac{\partial \xi}{\partial x} = -\gamma P_0 \frac{\partial \xi}{\partial x}$$
Step 2 — Newton's second law: The net force on the element is $[p(x) - p(x+dx)]A = -(\partial p / \partial x) A \, dx$. The mass is $\rho_0 A \, dx$. Therefore:
$$\rho_0 \frac{\partial^2 \xi}{\partial t^2} = -\frac{\partial p}{\partial x}$$
Step 3 — Combine: Substituting the expression for$p$ into Newton's law:
$$\rho_0 \frac{\partial^2 \xi}{\partial t^2} = B \frac{\partial^2 \xi}{\partial x^2}$$
Taking $\partial/\partial x$ of the pressure equation and combining, we get the wave equation directly for the acoustic pressure:
$$\boxed{\frac{\partial^2 p}{\partial x^2} = \frac{1}{c^2} \frac{\partial^2 p}{\partial t^2}}$$
where the speed of sound is $c = \sqrt{B/\rho_0} = \sqrt{\gamma P_0 / \rho_0}$. For air at 20°C: $c \approx 343$ m/s.
Sound Intensity and the Decibel Scale
For a plane harmonic wave $p(x,t) = p_0 \sin(kx - \omega t)$, the instantaneous intensity (power per unit area) is $I_{\text{inst}} = p \cdot v$ where$v = p/(\rho_0 c)$ is the particle velocity. Time-averaging:
$$\boxed{I = \frac{p_{\text{rms}}^2}{\rho_0 c} = \frac{p_0^2}{2\rho_0 c}}$$
The product $\rho_0 c$ is the acoustic impedance of the medium. For air,$\rho_0 c \approx 413$ Pa·s/m.
The ear responds to an enormous range of intensities. The decibel scale compresses this logarithmically:
$$\boxed{L = 10 \log_{10}\!\left(\frac{I}{I_0}\right) \quad \text{dB SPL}}$$
where $I_0 = 10^{-12}$ W/m$^2$ is the reference intensity, corresponding to $p_0 = 20 \; \mu$Pa, the threshold of hearing at 1 kHz. The threshold of pain is at approximately $I = 1$ W/m$^2$(120 dB SPL), corresponding to $p_0 \approx 20$ Pa.
2. Basilar Membrane Mechanics
The cochlea is a fluid-filled, coiled tube divided longitudinally by the basilar membrane (BM). The remarkable property of the BM is its tonotopic organisation: different frequencies excite different positions along its length, creating a spatial frequency map.
Derivation: The Tonotopic Map
The BM varies systematically along its 35 mm length: it is narrow and stiff at the base (width ~0.1 mm) and wide and floppy at the apex (width ~0.5 mm). Model the BM as a tapered elastic beam. The local resonant frequency depends on stiffness $k(x)$ and mass per unit length $m(x)$:
$$f(x) = \frac{1}{2\pi}\sqrt{\frac{k(x)}{m(x)}}$$
Experimentally, the stiffness decreases approximately exponentially from base to apex:$k(x) = k_0 \, e^{-2x/d}$ where $d \approx 7$ mm is the length constant. With mass varying more slowly, this gives the exponential tonotopic map:
$$\boxed{f(x) = f_{\max} \cdot e^{-x/d}}$$
where $f_{\max} \approx 20{,}000$ Hz at the base ($x = 0$) and$f_{\min} \approx 20$ Hz at the apex ($x = 35$ mm). Inverting:
$$x(f) = d \cdot \ln\!\left(\frac{f_{\max}}{f}\right)$$
This is Greenwood's equation (refined form). The logarithmic spacing means each octave occupies the same length of BM (~3.5 mm), consistent with our perception of pitch as logarithmic in frequency.
Derivation: The Traveling Wave (von Békésy)
Georg von Békésy (Nobel Prize 1961) showed that sound does not simply excite a resonant point on the BM. Instead, a traveling wave propagates from base to apex. Model the cochlea as a transmission line: two fluid chambers coupled through the elastic BM partition.
The pressure difference $\Delta p(x,t)$ across the BM drives its displacement$\eta(x,t)$. The fluid equations give:
$$\frac{\partial^2 \Delta p}{\partial x^2} = \frac{2\rho}{A} \frac{\partial^2 \eta}{\partial t^2}$$
where $A$ is the cross-sectional area of each chamber and $\rho$ is fluid density. The BM responds locally:
$$m(x)\frac{\partial^2 \eta}{\partial t^2} + r(x)\frac{\partial \eta}{\partial t} + k(x)\eta = \Delta p$$
For harmonic excitation at frequency $\omega$, the local wavenumber is:
$$\kappa(x) = \sqrt{\frac{2\rho \omega^2}{A[k(x) - m(x)\omega^2 + i r(x)\omega]}}$$
The wave propagates with increasing amplitude and decreasing wavelength until it reaches the characteristic place where $k(x) = m(x)\omega^2$(resonance). Beyond this point, the wave is evanescent and decays rapidly. The quality factor of the passive BM is $Q \approx 1$–$3$, meaning the frequency selectivity of the passive membrane is rather poor — the active process sharpens it dramatically.
3. Hair Cell Mechanotransduction
The organ of Corti, sitting atop the basilar membrane, contains ~3,500 inner hair cells (IHCs) and ~12,000 outer hair cells (OHCs). Each hair cell has a bundle of stereocilia connected by tip links — fine filaments that gate mechanosensitive ion channels directly.
Derivation: The Gating Spring Model
When the stereocilia bundle is deflected by a displacement $X$, the tip link exerts a force on the transduction channel. Model the tip link as a gating spring of stiffness $\kappa_g$. The channel has two states: closed (C) and open (O), with the open state having the gate swung through a distance $z \approx 4$ nm (the single-channel gating swing).
The energy difference between open and closed states depends on the applied force:
$$\Delta G = \Delta G_0 - \kappa_g z \left(X - X_0\right)$$
where $\Delta G_0$ is the intrinsic energy difference and $X_0$ is the set point. Using Boltzmann statistics, the open probability is:
$$\boxed{P_{\text{open}} = \frac{1}{1 + \exp\!\left(-\frac{\kappa_g z (X - X_0)}{k_B T}\right)}}$$
This is a Boltzmann sigmoid with sensitivity set by the ratio $\kappa_g z / k_B T$. With $\kappa_g \approx 1$ mN/m and $z \approx 4$ nm, the characteristic displacement is:
$$X_c = \frac{k_B T}{\kappa_g z} \approx \frac{4.1 \times 10^{-21}}{10^{-3} \times 4 \times 10^{-9}} \approx 1 \; \text{nm}$$
The operating range of the hair cell is thus only a few nanometres — astoundingly small!
Transduction Current and Adaptation
Each stereocilium tip has ~2 transduction channels with single-channel conductance$\gamma \approx 100$ pS. With a driving potential of ~80 mV (from the endolymph potential +80 mV combined with the resting potential of -60 mV), the single-channel current is:
$$i = \gamma \times V_{\text{drive}} = 100 \times 10^{-12} \times 0.14 \approx 14 \; \text{pA}$$
The total transduction current for $N \approx 100$ channels per bundle:
$$\boxed{I_{\text{trans}} = N \cdot i \cdot P_{\text{open}}(X)}$$
Adaptation is mediated by myosin-1c motors that climb along the actin core of the stereocilium, adjusting the resting tension on the tip link. This shifts $X_0$ to keep $P_{\text{open}}$ near ~5% at rest, maximising sensitivity to both positive and negative deflections. The adaptation time constant is $\tau_{\text{adapt}} \approx 1$–$10$ ms for fast adaptation and $\sim 100$ ms for slow adaptation.
4. The Cochlear Amplifier
The passive basilar membrane has poor frequency selectivity ($Q \approx 1$–$3$). Yet live cochleae show $Q \approx 50$–$300$, and the ear can detect displacements below 1 pm. Thomas Gold predicted in 1948 that the cochlea must contain an active amplifier. This was confirmed when William Brownell discovered outer hair cell electromotility in 1985.
Derivation: Outer Hair Cell Electromotility
Outer hair cells (OHCs) change length when their membrane potential changes. This electromotility is driven by the protein prestin (SLC26A5) densely packed in the OHC lateral wall (~8,000 motors/$\mu$m$^2$).
Prestin acts as a piezoelectric element. The fractional length change is proportional to the membrane potential change:
$$\boxed{\frac{\Delta L}{L} = \alpha \cdot \Delta V}$$
where $\alpha \approx 20$ nm/mV for an OHC of length $L \approx 30 \; \mu$m, giving a fractional change of ~$10^{-3}$/mV. The associated nonlinear capacitance follows a Boltzmann function:
$$C_{\text{NL}}(V) = C_{\max} \cdot \frac{\exp\!\left(\frac{ze(V - V_{1/2})}{k_B T}\right)}{\left[1 + \exp\!\left(\frac{ze(V - V_{1/2})}{k_B T}\right)\right]^2}$$
where $V_{1/2} \approx -40$ mV is the voltage of peak capacitance and$ze \approx 0.8e$ is the effective charge movement.
Derivation: Amplification Gain
The OHC operates in a positive feedback loop: BM motion $\to$ stereocilia deflection$\to$ transduction current $\to$ membrane potential change$\to$ OHC length change $\to$ enhanced BM motion. The loop gain is:
$$G_{\text{loop}} = \frac{\partial P_{\text{open}}}{\partial X} \cdot i \cdot N \cdot \frac{1}{C_m \omega} \cdot \alpha \cdot L \cdot \frac{k_{\text{BM}}}{\Delta p_{\text{BM}}}$$
Each factor represents a stage: mechanotransduction sensitivity, current, RC filtering, electromotility, and mechanical coupling. The net gain at the characteristic frequency can be as high as:
$$\boxed{G \approx 40 \text{--} 60 \; \text{dB} \quad (\times 100 \text{--} 1000 \; \text{in amplitude})}$$
This active amplification is what allows the ear to detect basilar membrane vibrations of less than 1 picometre — smaller than the diameter of an atom! The gain is compressive: large at low sound levels (soft sounds are amplified most) and decreasing at high levels, giving a roughly $\frac{1}{3}$-power compressive nonlinearity:
$$\eta_{\text{BM}} \propto p_{\text{input}}^{1/3}$$
This compression maps the enormous 120 dB input range onto a manageable ~40 dB range of BM vibration, enabling both sensitivity and resistance to overload.
5. Otoacoustic Emissions
A striking consequence of the active cochlea is that it can generate sound. These otoacoustic emissions (OAEs) were first detected by David Kemp in 1978 and are now routinely used for newborn hearing screening.
Derivation: Distortion Product OAEs
When two tones at frequencies $f_1$ and $f_2$ ($f_2 > f_1$,$f_2/f_1 \approx 1.2$) are presented to the ear, the nonlinear mechanics of the cochlea generates distortion products at combination frequencies. The compressive nonlinearity can be expanded as a power series:
$$\eta = a_1 p + a_2 p^2 + a_3 p^3 + \cdots$$
With two-tone input $p = A_1\cos(\omega_1 t) + A_2\cos(\omega_2 t)$, the cubic term generates:
$$a_3 p^3 \supset \frac{3a_3}{4} A_1^2 A_2 \cos(2\omega_1 - \omega_2)t + \cdots$$
The dominant distortion product is at:
$$\boxed{f_{\text{dp}} = 2f_1 - f_2}$$
This distortion is generated at the overlap region of the two traveling waves, then propagates back through the cochlea, through the middle ear, and is emitted as an acoustic signal that can be recorded with a sensitive microphone in the ear canal. The DPOAE level is typically 40–60 dB below the stimulus level in a normal ear and is absent or reduced in ears with OHC damage — making it a powerful clinical tool.
Spontaneous Otoacoustic Emissions
Remarkably, about 70% of ears with normal hearing produce spontaneous OAEs (SOAEs) — tonal sounds emitted continuously without any external stimulus. These arise from the active cochlea's operating point being very close to a Hopf bifurcation (the boundary between stable and self-oscillating behaviour). The dynamics near the bifurcation are described by the normal form:
$$\frac{dz}{dt} = (\mu + i\omega_0)z - |z|^2 z + F(t)$$
where $z$ is the complex amplitude of BM oscillation, $\mu$ is the bifurcation parameter (negative = stable, positive = self-oscillating), $\omega_0$is the natural frequency, and $F(t)$ is the external forcing. When$\mu < 0$ but close to zero, the system responds to external stimuli with the characteristic $\frac{1}{3}$-power compressive nonlinearity:
$$|z| \propto |F|^{1/3} \quad \text{for } |F| \gg |\mu|^{3/2}$$
This Hopf bifurcation model elegantly unifies the cochlear amplifier's key features: frequency selectivity, compressive nonlinearity, and spontaneous emissions all arise from a single dynamical mechanism.
Clinical Significance of OAEs
DPOAEs and transient-evoked OAEs (TEOAEs) are now standard clinical tools for:
- • Newborn hearing screening: Universal screening programmes use OAEs as the first-tier test. Present OAEs indicate functional OHCs and rule out moderate-to-severe sensorineural hearing loss.
- • Monitoring ototoxicity: Aminoglycoside antibiotics and cisplatin chemotherapy damage OHCs. Serial DPOAE monitoring detects damage before it becomes symptomatic, allowing dose adjustment.
- • Differentiating cochlear vs. neural hearing loss: Present OAEs with absent auditory brainstem response suggest auditory neuropathy — a disorder of the IHC-to-nerve synapse or the auditory nerve itself.
The DPOAE "gram" plots emission amplitude as a function of $f_2$ frequency, providing a frequency-specific map of OHC function that correlates well with behavioural audiometric thresholds.
Middle Ear Mechanics: Impedance Matching
Before reaching the cochlea, sound must be transferred from air to the cochlear fluid. The acoustic impedance mismatch between air ($Z_{\text{air}} = \rho_{\text{air}} c_{\text{air}} \approx 413$ Pa·s/m) and water ($Z_{\text{water}} = \rho_{\text{water}} c_{\text{water}} \approx 1.5 \times 10^6$ Pa·s/m) would reflect most of the sound energy. The reflection coefficient is:
$$R = \left(\frac{Z_{\text{water}} - Z_{\text{air}}}{Z_{\text{water}} + Z_{\text{air}}}\right)^2 \approx 0.999$$
This means 99.9% of incident energy would be reflected — a 30 dB loss! The middle ear overcomes this through two mechanisms:
- • Area ratio: The tympanic membrane area ($A_{\text{TM}} \approx 55$ mm$^2$) is much larger than the stapes footplate ($A_{\text{stapes}} \approx 3.2$ mm$^2$). The pressure gain is $A_{\text{TM}}/A_{\text{stapes}} \approx 17$.
- • Lever ratio: The ossicular chain (malleus, incus, stapes) acts as a lever with advantage ~1.3.
Total pressure gain: $17 \times 1.3 \approx 22$, or about 27 dB — recovering most of the impedance mismatch loss. The transformer ratio is:
$$G_{\text{ME}} = \frac{A_{\text{TM}}}{A_{\text{stapes}}} \times \frac{l_{\text{malleus}}}{l_{\text{incus}}} \approx 22$$
Frequency Selectivity: Quality Factor
The sharpness of frequency tuning is characterised by the quality factor $Q$, defined as the centre frequency divided by the bandwidth at -3 dB (or -10 dB in auditory research, denoted $Q_{10\text{dB}}$):
$$Q_{10\text{dB}} = \frac{f_{\text{CF}}}{\Delta f_{10\text{dB}}}$$
In the passive cochlea (post-mortem or at high intensities),$Q_{10\text{dB}} \approx 1$–$3$. In the active cochlea at low sound levels, $Q_{10\text{dB}} \approx 5$–$15$at low frequencies, increasing to $\sim 20$–$50$ at high frequencies. This means at 10 kHz, the tuning bandwidth is only ~200–500 Hz — remarkably sharp for a biological system.
The equivalent rectangular bandwidth (ERB) of an auditory filter at centre frequency$f$ (in Hz) is well approximated by:
$$\text{ERB}(f) = 24.7 \cdot (4.37 f / 1000 + 1) \; \text{Hz}$$
At 1 kHz: ERB $\approx 133$ Hz. At 4 kHz: ERB $\approx 456$ Hz. The ERB scale provides a perceptually uniform frequency axis.
6. Applications
Cochlear Implants
Cochlear implants bypass damaged hair cells by electrically stimulating the auditory nerve directly. An electrode array is inserted into the scala tympani with electrodes spaced along the tonotopic axis. The electrode-nerve interface is governed by:
$$I_{\text{threshold}} = \frac{V_{\text{th}}}{\rho / (4\pi r)} = 4\pi r \sigma V_{\text{th}}$$
where $r$ is the electrode-neuron distance, $\sigma$ is tissue conductivity, and $V_{\text{th}} \approx 10$ mV is the neural threshold. Modern implants achieve 12–22 spectral channels, sufficient for speech comprehension.
The channel interaction problem limits cochlear implant performance. Current spreads through the conductive perilymph, so each electrode stimulates a broad region. The spatial spread of the electric field from a monopolar electrode decays as:
$$V(r) = \frac{I}{4\pi\sigma r} \exp(-r/\lambda)$$
where $\lambda \approx 3$–$5$ mm is the space constant in the cochlea. This broad current spread means effective spectral resolution is limited to 4–8 independent channels, well below the 22 physical electrodes. Bipolar and tripolar electrode configurations can narrow the current spread but require higher stimulation levels.
Hearing Aids: Amplification Strategy
Modern digital hearing aids implement frequency-dependent amplification that mimics the lost cochlear amplifier gain. The prescription formula maps audiometric hearing loss$HL(f)$ (in dB) to the required insertion gain:
$$G_{\text{target}}(f) \approx \frac{HL(f)}{2} \quad \text{(half-gain rule, simplified)}$$
The half-gain rule approximates the fact that OHC damage eliminates the cochlear amplifier (40–60 dB gain) but the neural threshold remains near normal. Wide dynamic range compression (WDRC) is used to restore the compressive nonlinearity of the healthy cochlea:
$$L_{\text{out}} = L_{\text{TK}} + \frac{L_{\text{in}} - L_{\text{TK}}}{CR}$$
where $L_{\text{TK}}$ is the compression threshold (knee-point, typically 40–50 dB SPL) and $CR$ is the compression ratio (typically 2:1 to 3:1). This provides more gain for soft sounds and less gain for loud sounds, restoring something like the healthy cochlea's compressive input-output function.
Noise-Induced Hearing Loss
Damage to OHCs from excessive noise exposure causes a loss of the cochlear amplifier gain. The damage threshold follows an equal-energy hypothesis: the total acoustic energy dose determines damage:
$$E = I \times t = \frac{p_{\text{rms}}^2}{\rho_0 c} \times t$$
OSHA regulations specify a permissible exposure limit of 85 dB SPL for 8 hours, with a 3 dB exchange rate: each 3 dB increase halves the permissible duration (since intensity doubles). At 100 dB, the safe exposure time is only 15 minutes.
7. Historical Development
- • Hermann von Helmholtz (1863): Proposed the resonance theory of hearing — the BM acts as a series of tuned resonators, each responding to a specific frequency. This was the first physical theory of pitch perception.
- • Georg von Békésy (1928–1960, Nobel 1961): Demonstrated the traveling wave on the basilar membrane using cadaver cochleae and stroboscopic illumination. Showed that the passive BM has poor frequency selectivity.
- • Thomas Gold (1948): Predicted that the cochlea must contain an active amplifier to explain its extraordinary sensitivity and frequency selectivity. Largely ignored for 30 years until confirmed experimentally.
- • William Brownell (1985): Discovered that OHCs change length in response to electrical stimulation (electromotility), providing the molecular basis for the cochlear amplifier.
- • David Kemp (1978): First detected otoacoustic emissions, proving the existence of the active cochlear process and launching a clinical revolution in hearing screening.
8. Python Simulations
Basilar Membrane Map, Traveling Wave, Hair Cell Gating, and Audiogram
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Middle Ear Matching, Cochlear Compression, DPOAE, and Auditory Filters
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Chapter Summary
- • Sound waves obey $\partial^2 p/\partial x^2 = (1/c^2)\partial^2 p/\partial t^2$ with intensity $I = p_{\text{rms}}^2/(\rho c)$. The decibel scale compresses the ear's 120 dB dynamic range.
- • Basilar membrane tonotopy follows $f(x) = f_{\max} e^{-x/d}$, creating a logarithmic frequency-to-place map. The traveling wave peaks at the characteristic place.
- • Hair cell transduction uses tip-link gating springs: $P_{\text{open}} = 1/(1+e^{-\kappa_g z X / k_BT})$ with characteristic displacement ~1 nm. Adaptation via myosin motors maintains sensitivity.
- • Cochlear amplifier: OHC electromotility ($\Delta L/L = \alpha \Delta V$ from prestin) provides 40–60 dB active gain, enabling detection of sub-picometre displacements.
- • Otoacoustic emissions: the active cochlea emits sound at $f_{\text{dp}} = 2f_1 - f_2$, enabling clinical hearing screening.
- • Applications: cochlear implants, noise damage thresholds, audiometry — all grounded in quantitative biophysics.