Part I · Chapter 3

Channel Capacity

Every noisy channel has a maximum rate — its capacity — below which arbitrarily reliable communication is possible. Above it, errors are unavoidable. This is Shannon's most profound result.

1. Mutual Information

The mutual information \(I(X;Y)\) between a channel input \(X\) and output \(Y\) measures how much knowing\(Y\) reduces uncertainty about \(X\), or equivalently how much information is successfully transmitted:

\[ I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y) \]

Alternatively, mutual information is a Kullback-Leibler divergence:

\[ I(X;Y) = \sum_{x,y} p(x,y) \log_2 \frac{p(x,y)}{p(x)\,p(y)} = D_{\mathrm{KL}}(p(x,y) \,\|\, p(x)p(y)) \]

Since KL divergence is always non-negative, \(I(X;Y) \geq 0\) with equality iff\(X\) and \(Y\) are independent (the channel is completely useless). Mutual information is symmetric: \(I(X;Y) = I(Y;X)\).

Entropy Venn Diagram

Mutual information \(I(X;Y)\) is the “overlap” region: information shared between \(X\) and \(Y\).

2. Channel Capacity

A discrete memoryless channel (DMC) is specified by a conditional distribution \(p(y|x)\) over output alphabet \(\mathcal{Y}\)given input \(\mathcal{X}\). The channel is “memoryless” in that each use is independent of all others.

The channel capacity is defined as:

\[ C = \max_{p(x)} I(X;Y) \]

The maximum is over all input distributions \(p(x)\).

Capacity has units of bits per channel use. It is a property of the channel alone, independent of what we transmit. The maximizing input distribution is the one that best exploits the channel structure.

For symmetric channels (BSC, BEC), the maximum is achieved by the uniform input distribution, which makes the derivation particularly clean.

3. Binary Symmetric Channel (BSC)

In the BSC, each transmitted bit is flipped independently with probability \(p_e\). By symmetry, the capacity-achieving input is the uniform distribution \(p(X=0) = p(X=1) = 1/2\), giving:

\[ C_{\text{BSC}}(p_e) = 1 - H_b(p_e) = 1 + p_e \log_2 p_e + (1-p_e)\log_2(1-p_e) \]

Intuition: \(H(Y) = 1\) bit for the uniform output, but the channel adds \(H_b(p_e)\) bits of noise uncertainty (knowing the input, output is still random). Their difference is the information that got through.

p_e = 0

C = 1 bit

Perfect

p_e = 0.1

C ≈ 0.531

Lossy

p_e = 0.25

C ≈ 0.189

Very noisy

p_e = 0.5

C = 0

Useless

4. Binary Erasure Channel (BEC)

In the BEC, each transmitted bit is either received correctly or declared “erased” (with probability \(\varepsilon\)) — the receiver knows when a bit was lost but not what its value was. Unlike the BSC, errors are always detectable. The capacity is:

\[ C_{\text{BEC}}(\varepsilon) = 1 - \varepsilon \]

Derivation: \(H(Y) = H_b(\varepsilon/2 + (1-\varepsilon) \cdot p)\) (the ternary output distribution) and \(H(Y|X) = H_b(\varepsilon)\). After maximizing over\(p\), \(I(X;Y) = (1-\varepsilon)(H(X) )\), which is maximized at\(H(X) = 1\) giving \(C = 1-\varepsilon\).

The BEC is the canonical channel for LDPC and turbo codes: the erasure pattern is known, so decoding reduces to solving a linear system over \(\mathbb{F}_2\). Polar codes were the first family proved to achieve the BEC capacity efficiently.

5. Shannon's Noisy Channel Coding Theorem

Theorem (Shannon, 1948 — Noisy Channel Coding Theorem)

For a discrete memoryless channel with capacity \(C\):

Achievability: For any rate \(R < C\)and any \(\epsilon > 0\), there exists a sequence of codes with rate \(\geq R\) and block error probability \(\leq \epsilon\).
Converse: For any rate \(R > C\), every code sequence has block error probability bounded away from zero.

The proof of achievability uses random coding: pick a codebook by drawing \(2^{nR}\) codewords uniformly from the typical input sequences. With high probability over the choice of codebook, the maximum-likelihood decoder achieves vanishing error as \(n \to \infty\).

The proof of the converse uses Fano's inequality: if the error probability is small, then \(H(M|Y^n) \leq n\epsilon\) (the message \(M\) is nearly determined by the received sequence), and therefore \(nR \leq I(X^n; Y^n) + n\epsilon \leq nC + n\epsilon\). Letting \(\epsilon \to 0\) gives \(R \leq C\).

What makes this theorem astonishing is its existence nature: it guarantees that good codes exist at every rate below \(C\), but does not say how to construct them efficiently. Finding practical capacity-achieving codes (Turbo, LDPC, Polar) took another 45 years.

6. The Converse: Above C, Errors are Unavoidable

The converse to the noisy channel coding theorem is as important as achievability. It says there is a hard wall at capacity: no matter how clever the encoder and decoder, if the transmission rate \(R > C\), there is no escaping a positive error probability.

The key tool is Fano's inequality:

\[ H(X|Y) \leq H_b(P_e) + P_e \log_2(|\mathcal{X}| - 1) \]

This bounds the residual uncertainty about \(X\) given \(Y\) by a function of the error probability \(P_e\). If \(P_e \to 0\), then\(H(X|Y) \to 0\) — the receiver recovers the message. The converse then follows by applying data-processing and the definition of capacity.

Together, achievability and the converse give a complete operational characterization of capacity: \(C\) is the supremum of all achievable rates. This is one of the deepest single results in all of engineering science.

Python Simulation: BSC and BEC Capacity

Four panels: (1) BSC capacity vs crossover probability, (2) BEC capacity vs erasure probability, (3) error rate of repetition coding vs code rate on BSC, (4) mutual information I(X;Y) as a function of input distribution for various BSC parameters.

Python

script.py150 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

def binary_entropy(p):
    p = np.asarray(p, dtype=float)
    safe = np.clip(p, 1e-12, 1 - 1e-12)
    return -safe * np.log2(safe) - (1 - safe) * np.log2(1 - safe)

# ── Figure setup ──────────────────────────────────────────────────────────────
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.patch.set_facecolor('#0a0a1a')

# ── Panel 1: BSC capacity vs crossover probability ────────────────────────────
ax1 = axes[0, 0]
pe = np.linspace(0, 1, 500)
C_bsc = 1 - binary_entropy(pe)

ax1.plot(pe, C_bsc, color='#818cf8', linewidth=2.5, label='C(BSC) = 1 − H(pₑ)')
ax1.fill_between(pe, 0, C_bsc, alpha=0.15, color='#6366f1')
ax1.axvline(x=0.5, color='#f87171', linestyle='--', alpha=0.7, linewidth=1.5,
            label='pₑ=0.5 → C=0 (useless channel)')
ax1.axvline(x=0.0, color='#34d399', linestyle=':', alpha=0.7, linewidth=1.5,
            label='pₑ=0  → C=1 (perfect channel)')
ax1.set_xlabel('Crossover probability pₑ', fontsize=12, color='white')
ax1.set_ylabel('Capacity C  [bits/use]', fontsize=12, color='white')
ax1.set_title('Binary Symmetric Channel (BSC) Capacity',
              fontsize=12, color='white', fontweight='bold')
ax1.legend(fontsize=9, facecolor='#1a1a2e', edgecolor='#818cf8', labelcolor='white')
ax1.set_facecolor('#0a0a1a')
ax1.tick_params(colors='white')
ax1.grid(True, alpha=0.2, color='#818cf8')
ax1.set_xlim(0, 1); ax1.set_ylim(0, 1.05)
for spine in ax1.spines.values():
    spine.set_color('#818cf8')

bsc_examples = [(0.1, '0.1'), (0.25, '0.25')]
for pe_val, lbl in bsc_examples:
    cap = 1 - binary_entropy(pe_val)
    ax1.annotate(f'pₑ={lbl}\nC={cap:.3f}', xy=(pe_val, cap),
                 xytext=(pe_val + 0.07, cap + 0.1),
                 color='#f59e0b', fontsize=9,
                 arrowprops=dict(arrowstyle='->', color='#f59e0b', lw=1.2))
print("BSC Capacity:")
for pe_val in [0.0, 0.1, 0.25, 0.5]:
    print(f"  pₑ = {pe_val:.2f}  →  C = {1 - binary_entropy(pe_val):.4f} bits/use")

# ── Panel 2: BEC capacity vs erasure probability ──────────────────────────────
ax2 = axes[0, 1]
eps = np.linspace(0, 1, 500)
C_bec = 1 - eps

ax2.plot(eps, C_bec, color='#22d3ee', linewidth=2.5, label='C(BEC) = 1 − ε')
ax2.fill_between(eps, 0, C_bec, alpha=0.15, color='#0891b2')
ax2.plot(pe, C_bsc, color='#818cf8', linewidth=2.0, linestyle='--',
         alpha=0.7, label='BSC capacity (for comparison)')
ax2.set_xlabel('Erasure probability ε', fontsize=12, color='white')
ax2.set_ylabel('Capacity C  [bits/use]', fontsize=12, color='white')
ax2.set_title('Binary Erasure Channel (BEC) Capacity',
              fontsize=12, color='white', fontweight='bold')
ax2.legend(fontsize=10, facecolor='#1a1a2e', edgecolor='#22d3ee', labelcolor='white')
ax2.set_facecolor('#0a0a1a')
ax2.tick_params(colors='white')
ax2.grid(True, alpha=0.2, color='#22d3ee')
ax2.set_xlim(0, 1); ax2.set_ylim(0, 1.05)
for spine in ax2.spines.values():
    spine.set_color('#22d3ee')
print("\nBEC Capacity:")
for eps_val in [0.0, 0.1, 0.25, 0.5]:
    print(f"  ε = {eps_val:.2f}  →  C = {1-eps_val:.4f} bits/use")

# ── Panel 3: Repetition coding through BSC — error rate vs code rate ──────────
ax3 = axes[1, 0]
rng = np.random.default_rng(42)
N_TRIALS = 50_000
pe_sim = 0.15   # fixed channel crossover probability
repetition_orders = list(range(1, 12, 2))   # 1,3,5,7,9,11
bit_error_rates = []
code_rates = []

for rep in repetition_orders:
    # Encode: each bit repeated 'rep' times
    bits = rng.integers(0, 2, size=N_TRIALS)
    tx = np.repeat(bits[:, None], rep, axis=1)  # (N, rep)
    # Channel: flip each bit independently with probability pe_sim
    noise = rng.random(size=tx.shape) < pe_sim
    rx = tx ^ noise.astype(int)
    # Decode: majority vote
    decoded = (rx.sum(axis=1) > rep / 2).astype(int)
    ber = np.mean(decoded != bits)
    bit_error_rates.append(ber)
    code_rates.append(1.0 / rep)

ax3.semilogy(code_rates, np.maximum(bit_error_rates, 1e-5), color='#818cf8',
             linewidth=2.5, marker='o', markersize=7, label=f'BSC pₑ={pe_sim}')
ax3.axvline(x=1 - binary_entropy(pe_sim), color='#f59e0b', linestyle='--',
            linewidth=2, alpha=0.8, label=f'Capacity C = {1-binary_entropy(pe_sim):.3f}')
ax3.set_xlabel('Code rate R = 1/n  [bits/channel use]', fontsize=12, color='white')
ax3.set_ylabel('Bit error rate (log scale)', fontsize=12, color='white')
ax3.set_title(f'Repetition Coding on BSC (pₑ={pe_sim})\n'
              'Error rate vs. code rate (rate → 0 helps, but wasteful)',
              fontsize=11, color='white', fontweight='bold')
ax3.legend(fontsize=10, facecolor='#1a1a2e', edgecolor='#818cf8', labelcolor='white')
ax3.set_facecolor('#0a0a1a')
ax3.tick_params(colors='white')
ax3.grid(True, alpha=0.2, color='#818cf8')
for spine in ax3.spines.values():
    spine.set_color('#818cf8')
print("\nRepetition coding on BSC (pₑ=0.15):")
for rep, ber, r in zip(repetition_orders, bit_error_rates, code_rates):
    print(f"  n={rep}  rate={r:.3f}  BER={ber:.4f}")

# ── Panel 4: Mutual information as a function of input distribution (BSC) ─────
ax4 = axes[1, 1]
# For BSC with crossover pe, I(X;Y) = H(Y) - H(pe)
# Y is Bernoulli(p*(1-pe) + (1-p)*pe) = Bernoulli(p + pe - 2*p*pe)
pe_vals = [0.05, 0.1, 0.2, 0.3]
p_x = np.linspace(0, 1, 400)
colors4 = ['#34d399', '#818cf8', '#f59e0b', '#f87171']

for pe_v, col in zip(pe_vals, colors4):
    p_y = p_x * (1 - pe_v) + (1 - p_x) * pe_v   # P(Y=1)
    H_y = binary_entropy(p_y)
    MI = H_y - binary_entropy(pe_v)
    MI = np.maximum(MI, 0)
    cap_v = 1 - binary_entropy(pe_v)
    ax4.plot(p_x, MI, color=col, linewidth=2.0,
             label=f'pₑ={pe_v}  C={cap_v:.3f}')

ax4.set_xlabel('Input P(X=1) = p', fontsize=12, color='white')
ax4.set_ylabel('Mutual Information I(X;Y)  [bits]', fontsize=12, color='white')
ax4.set_title('I(X;Y) vs Input Distribution for BSC\n(Maximized at p=0.5 → Capacity)',
              fontsize=11, color='white', fontweight='bold')
ax4.legend(fontsize=10, facecolor='#1a1a2e', edgecolor='#818cf8', labelcolor='white')
ax4.set_facecolor('#0a0a1a')
ax4.tick_params(colors='white')
ax4.grid(True, alpha=0.2, color='#818cf8')
ax4.set_xlim(0, 1)
for spine in ax4.spines.values():
    spine.set_color('#818cf8')

plt.tight_layout(pad=2.0)
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a1a')
print("\n--- Channel Capacity Summary ---")
print("BSC: C = 1 - H(pₑ)  bits/use  (capacity from symmetric uniform input)")
print("BEC: C = 1 - ε      bits/use  (linear because erased bits are simply lost)")
print("Capacity is the maximum achievable rate for arbitrarily reliable communication")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

← Source Coding Next: Huffman & Arithmetic Codes →

Share:X Reddit LinkedIn