Neural Coding
Spike trains, rate coding, temporal coding, and the information-theoretic foundations of neural communication
How Neurons Encode Information
The brain processes information through electrical impulses called action potentials, or spikes. Understanding how sensory stimuli, motor commands, and cognitive states are represented in the patterns of these spikes is the central question of neural coding. Adrian's pioneering recordings in 1926 demonstrated that stimulus intensity is encoded in firing rate, launching a century of investigation into the neural code.
This chapter explores two fundamental coding paradigms — rate coding and temporal coding — and applies Shannon's information theory to quantify how much information neural responses carry about stimuli. We derive key results for spike train statistics, mutual information, and population coding efficiency.
1. Spike Trains and Point Processes
A neuron's output is a sequence of action potentials — stereotyped voltage pulses of roughly 1 ms duration and ~100 mV amplitude. Because individual spikes are nearly identical, all information is carried in their timing. We represent a spike train as a sum of Dirac delta functions:
The Spike Train Representation
Given spike times $\{t_1, t_2, \ldots, t_n\}$, the spike train is:
$$\rho(t) = \sum_{i=1}^{n} \delta(t - t_i)$$
The instantaneous firing rate is obtained by averaging over many trials (ensemble average) or by smoothing with a kernel $K(t)$:
$$r(t) = \langle \rho(t) \rangle = \lim_{\Delta t \to 0} \frac{\langle n(t, t+\Delta t) \rangle}{\Delta t}$$
For a Poisson process with constant rate $\lambda$, the interspike intervals (ISIs) follow an exponential distribution: $P(\tau) = \lambda e^{-\lambda \tau}$ with mean $\langle \tau \rangle = 1/\lambda$ and coefficient of variation$\text{CV} = 1$.
The Fano factor $F = \sigma_n^2 / \langle n \rangle$ measures spike count variability. For a Poisson process, $F = 1$. Cortical neurons typically show $F \approx 1\text{--}1.5$, suggesting near-Poisson variability, though this is debated. Refractoriness introduces negative correlations that reduce $F$ below 1 at short timescales.
2. Rate Coding
Rate coding posits that information is carried in the mean firing rate over some time window. Adrian and Zotterman (1926) showed that muscle spindle afferents increase their firing rate proportionally to applied force. The neural response function, or tuning curve, relates stimulus features to mean firing rate.
Derivation 1: Tuning Curves and the Cosine Model
Georgopoulos et al. (1982) found that motor cortex neurons have cosine tuning to movement direction. For a neuron with preferred direction $\theta_{\text{pref}}$:
$$r(\theta) = r_0 + r_{\max} \cos(\theta - \theta_{\text{pref}})$$
where $r_0$ is the baseline rate and $r_{\max}$ is the modulation depth. For a population of $N$ neurons with uniformly distributed preferred directions, the population vector estimate of movement direction is:
$$\hat{\theta}_{\text{pop}} = \arg\left(\sum_{i=1}^{N} r_i \, e^{i \theta_{\text{pref},i}}\right)$$
As $N \to \infty$, this estimator converges to the true direction with variance scaling as $\sigma^2 \propto 1/N$. The population vector averages out noise across neurons, achieving a population-level signal-to-noise ratio that grows as $\sqrt{N}$.
Sensory neurons display diverse tuning curve shapes: Gaussian tuning in orientation selectivity ($r(\theta) = r_{\max} \exp[-((\theta - \theta_0)/\sigma)^2 / 2]$), sigmoidal tuning in intensity coding, and bell-shaped tuning in frequency selectivity. The choice of tuning curve shape profoundly affects the information capacity of the neural population.
3. Temporal Coding
Temporal coding proposes that precise spike timing carries information beyond what is captured by firing rate. Evidence comes from multiple systems: auditory neurons phase-lock to sound waveforms with sub-millisecond precision, retinal ganglion cells encode visual stimuli with precise first-spike latencies, and hippocampal place cells exhibit theta-phase precession.
Derivation 2: Spike Timing Precision and the Victor-Purpura Metric
To quantify temporal coding, we need a distance metric between spike trains. The Victor-Purpura (1996) metric defines the distance as the minimum cost to transform one spike train into another using three operations:
- Insert a spike: cost = 1
- Delete a spike: cost = 1
- Move a spike by $\Delta t$: cost = $q |\Delta t|$
The parameter $q$ (units: 1/time) sets the temporal resolution. When$q \to 0$, the metric counts only spike number differences (rate code). When $q \to \infty$, it counts coincident spikes only (temporal code). The metric is computed via dynamic programming with complexity $O(n_1 n_2)$ where$n_1, n_2$ are spike counts:
$$D(i,j) = \min\begin{cases} D(i-1, j) + 1 \\ D(i, j-1) + 1 \\ D(i-1, j-1) + q|t_i^{(1)} - t_j^{(2)}| \end{cases}$$
Applying this metric to neural data and varying $q$ reveals the timescale at which temporal information becomes relevant. For cortical neurons, $q_{\text{opt}} \sim 10\text{--}100$ s$^{-1}$, corresponding to temporal precision of 10–100 ms.
3.1 Phase Coding in the Hippocampus
Hippocampal place cells fire at progressively earlier phases of the theta rhythm (6–10 Hz) as an animal traverses a place field — the phenomenon of theta phase precession discovered by O'Keefe and Recce (1993). The phase encodes position within the place field:
$$\phi(x) = \phi_0 - 2\pi \frac{x - x_{\text{start}}}{x_{\text{end}} - x_{\text{start}}}$$
This dual code (firing rate encodes location, phase encodes position within the field) provides a higher-resolution spatial signal than rate alone, with theoretical precision improvements of up to an order of magnitude.
4. Information Theory of Neural Coding
Shannon's information theory provides a rigorous framework for quantifying neural coding efficiency. The mutual information between stimulus $S$ and response $R$ is:
$$I(S; R) = H(R) - H(R|S) = \sum_{s,r} P(s,r) \log_2 \frac{P(s,r)}{P(s)P(r)}$$
Derivation 3: Information in a Poisson Spike Count
Consider a neuron with Poisson statistics whose rate $\lambda(s)$ depends on stimulus$s$. The spike count $n$ in a window $T$ follows $P(n|s) = \frac{(\lambda T)^n e^{-\lambda T}}{n!}$. The response entropy is:
$$H(R) = -\sum_n P(n) \log_2 P(n)$$
where $P(n) = \langle P(n|s) \rangle_s$. The noise entropy is:
$$H(R|S) = \left\langle -\sum_n P(n|s) \log_2 P(n|s) \right\rangle_s$$
For a Poisson distribution with mean $\mu = \lambda T$, the entropy is approximately:
$$H_{\text{Poisson}}(\mu) \approx \frac{1}{2} \log_2(2\pi e \mu) \quad \text{for } \mu \gg 1$$
The mutual information is then bounded by the difference between the total response variability and the intrinsic noise: $I(S;R) \leq \frac{1}{2}\log_2\left(1 + \frac{\text{Var}[\lambda]}{\langle \lambda \rangle}\, T\right)$. This shows that information grows logarithmically with time window $T$, and linearly with the signal-to-noise ratio $\text{Var}[\lambda]/\langle \lambda \rangle$.
Derivation 4: The Direct Method for Entropy Estimation
Strong et al. (1998) developed the "direct method" for estimating information in spike trains without assuming a coding model. Discretize time into bins of width $\Delta t$, encoding the spike train as a binary string. The total entropy of response words of length $L$ is:
$$H_{\text{total}}(L) = -\sum_{\text{words } w} P(w) \log_2 P(w)$$
The noise entropy uses stimulus-locked responses:
$$H_{\text{noise}}(L) = \left\langle -\sum_w P(w|s) \log_2 P(w|s) \right\rangle_s$$
The information rate is:
$$\dot{I} = \frac{H_{\text{total}}(L) - H_{\text{noise}}(L)}{L \cdot \Delta t} \quad \text{(bits/s)}$$
Extrapolation to $L \to \infty$ and correction for finite data bias (quadratic extrapolation in $1/N$) yields the true information rate. Fly H1 neurons transmit ~90 bits/s, approaching 50% of the channel capacity, indicating remarkably efficient coding.
Derivation 5: Fisher Information and the Cramér-Rao Bound
Fisher information quantifies the precision of neural population codes. For a population of $N$ neurons with tuning curves $f_i(s)$ and independent Poisson noise, the Fisher information is:
$$J(s) = \sum_{i=1}^{N} \frac{[f_i'(s)]^2}{f_i(s)}$$
The Cramér-Rao bound sets a lower limit on the variance of any unbiased estimator$\hat{s}$ of the stimulus:
$$\text{Var}(\hat{s}) \geq \frac{1}{J(s)}$$
For Gaussian tuning curves $f_i(s) = A \exp[-(s - s_i)^2 / (2\sigma^2)]$ with uniformly spaced preferred stimuli (spacing $\Delta s$), the total Fisher information is:
$$J_{\text{total}} = \frac{N A}{\sigma^2} \cdot C(\sigma / \Delta s)$$
where $C$ is a coverage factor that depends on the ratio of tuning width to spacing. Optimal coding requires $\sigma / \Delta s \approx 1$, balancing resolution against coverage. This result predicts that wider tuning curves are preferred when noise is high.
5. Historical Development
- • 1926: Edgar Adrian records single-unit activity from sensory neurons, discovers rate coding — firing rate increases with stimulus intensity (Nobel Prize, 1932).
- • 1948: Claude Shannon publishes "A Mathematical Theory of Communication," providing the mathematical framework for quantifying information.
- • 1959: Hubel and Wiesel discover orientation-selective neurons in primary visual cortex, revealing feature-selective coding (Nobel Prize, 1981).
- • 1982: Georgopoulos introduces the population vector model for motor cortex direction coding.
- • 1993: O'Keefe and Recce discover theta phase precession in hippocampal place cells, demonstrating temporal coding of position.
- • 1996: Rieke, Warland, de Ruyter van Steveninck, and Bialek publish "Spikes," establishing information-theoretic analysis of neural coding.
- • 1998: Strong et al. develop the direct method for estimating information rates in spike trains, showing fly neurons operate near channel capacity.
6. Applications
Brain-Computer Interfaces
Population vector decoding and Bayesian decoders use tuning curve models to interpret neural activity for prosthetic control. Understanding rate vs. temporal coding determines optimal decoding strategies and required recording bandwidth.
Cochlear Implants
Temporal coding principles guide the design of electrode stimulation patterns. Phase-locking in auditory nerve fibers up to ~5 kHz means temporal coding strategies can restore pitch perception for lower frequencies.
Retinal Prosthetics
Information-theoretic analysis determines the number of electrodes needed to restore useful vision. Fisher information calculations predict the spatial resolution achievable with a given electrode array geometry.
Machine Learning
Neural coding principles inspire efficient encoding in spiking neural networks. Population coding and sparse coding ideas have influenced compressed sensing algorithms and neuromorphic computing architectures.
7. Computational Exploration
Neural Coding: Spike Trains, Tuning Curves, and Information Theory
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Chapter Summary
- • Spike trains are modeled as point processes; Poisson statistics give $\text{CV} = 1$ and Fano factor $F = 1$, consistent with cortical recordings.
- • Rate coding via cosine tuning curves enables population vector decoding with precision scaling as $1/\sqrt{N}$.
- • Temporal coding carries information in spike timing; the Victor-Purpura metric quantifies temporal precision at different timescales.
- • Mutual information $I(S;R) = H(R) - H(R|S)$ quantifies stimulus information in neural responses, growing logarithmically with integration time.
- • Fisher information sets fundamental limits on decoding precision via the Cramér-Rao bound: $\text{Var}(\hat{s}) \geq 1/J(s)$.