Differential Entropy
Extending Shannon entropy to continuous random variables β with the surprising twist that the entropy can be negative, and that the Gaussian is uniquely entropy-maximizing.
7.1 Definition
For a continuous random variable \(X\) with probability density function\(f(x)\), the differential entropy is:
\[ h(X) = -\int_{-\infty}^{\infty} f(x)\log f(x)\,dx \]
The integral is taken over the support of \(f\). Like discrete entropy, it measures βspreadβ or βuncertaintyβ of a distribution β but with a critical difference: differential entropy can be negative, and its value depends on the units of measurement.
The non-negativity of discrete entropy relied on \(0 \le p_i \le 1\) implying\(-\log p_i \ge 0\). For a continuous pdf, \(f(x)\) can exceed 1, making \(-\log f(x)\) negative, so \(h(X)\) can be negative.
7.2 Closed-Form Results
| Distribution | Parameters | Differential Entropy h(X) |
|---|---|---|
| Uniform | \([a,b]\) | \(\log(b-a)\) |
| Gaussian | \(\mathcal{N}(\mu,\sigma^2)\) | \(\frac{1}{2}\log(2\pi e\,\sigma^2)\) |
| Exponential | \(\text{Exp}(\lambda)\) | \(1 - \log\lambda\) |
| Laplace | \(\text{Lap}(\mu,b)\) | \(1 + \log(2b)\) |
| Multivariate Gaussian | \(\mathcal{N}(\boldsymbol\mu,\Sigma)\) | \(\frac{n}{2}\log(2\pi e) + \frac{1}{2}\log\det\Sigma\) |
7.3 Maximum Entropy Distributions
Subject to a variance constraint\(\operatorname{Var}(X)=\sigma^2\), the Gaussian maximizes differential entropy:
\[ h(X) \;\le\; \tfrac{1}{2}\log(2\pi e\,\sigma^2) \]
with equality iff \(X \sim \mathcal{N}(0,\sigma^2)\)
Proof sketch (relative entropy): For any \(f\) with variance \(\sigma^2\), let \(g=\mathcal{N}(0,\sigma^2)\). By non-negativity of KL divergence:
\[ 0 \;\le\; D_{\mathrm{KL}}(f\|g) = -h(X) - \int f\log g\,dx \]
Since \(\log g(x) = -\frac{x^2}{2\sigma^2} - \frac{1}{2}\log(2\pi\sigma^2)\), the integral \(\int f\log g\,dx\) depends only on the second moment of \(f\), which equals \(\sigma^2\) by assumption. Therefore \(h(X) \le h(g)\).
This result is why Gaussian noise is the worst case for channel capacity problems β it is the βmost uncertainβ noise for a given power level.
Other Maximum Entropy Results
- β£Support [a, b] constraint: Uniform distribution maximizes \(h(X)\)
- β£Mean constraint \(E[X]=\mu\), support \([0,\infty)\): Exponential distribution
- β£No constraint (only normalization): \(h(X)\to\infty\) β no maximum exists
7.4 Properties of Differential Entropy
Scaling
\[ h(aX) = h(X) + \log|a| \]
Stretching a distribution increases its entropy logarithmically
Translation Invariance
\[ h(X + c) = h(X) \]
Shifting does not change the shape of the distribution
Chain Rule
\[ h(X,Y) = h(X) + h(Y|X) \]
Same form as discrete; conditioning reduces entropy
Mutual Information
\[ I(X;Y) = h(X) - h(X|Y) \ge 0 \]
Always non-negative even though h can be negative
Relation to Discrete Entropy
Quantizing \(X\) to bins of width \(\Delta\) gives a discrete variable \(X^\Delta\) with:
\[ H(X^\Delta) \approx h(X) - \log\Delta \]
As \(\Delta\to 0\) (finer quantization), \(H(X^\Delta)\to\infty\). The divergence rate is \(-\log\Delta\) and the βremainderβ is \(h(X)\). This is why \(h(X)\) can be negative but \(H\) is always non-negative.
Python: Computing & Visualizing Differential Entropy
Compare analytical formulas against numerical estimates from histograms. Six panels: entropy vs parameter, PDFs, bar comparison, Gaussian maximality, negativity region, and the discrete-to-continuous relation.
Click Run to execute the Python code
Code will be executed with Python 3 on the server