Part I: Foundations
The three chapters of Part I build the entire edifice of classical information theory from a single 1948 paper.
Shannon's 1948 Paper
In July 1948, Claude Shannon published “A Mathematical Theory of Communication” in the Bell System Technical Journal. In 55 pages Shannon accomplished something remarkable: he gave a precise, quantitative answer to the question “what is information?” and then derived the fundamental limits governing every communication system and every compression algorithm — limits that stand to this day.
Shannon's key insight was to separate the meaning of a message from its statistical structure. Information, in Shannon's framework, is purely a matter of probability: a message drawn from a distribution with probability \(p\) carries \(-\log_2 p\) bits of information, regardless of what it says. From this simple definition he derived two theorems of extraordinary depth: the source coding theorem (compression cannot beat entropy) and the noisy channel coding theorem (reliable communication is possible at any rate below channel capacity).
Part I develops these three pillars in order — entropy, source coding, channel capacity — building the mathematical scaffolding you need to appreciate every later development in the course.
Core Equations of Part I
Shannon Entropy
\( H(X) = -\sum_{i} p_i \log_2 p_i \)
Source Coding Bound
\( \bar{\ell} \geq H(X) \)
Channel Capacity
\( C = \max_{p(x)} I(X;Y) \)
Chapters
Chapter 1: Shannon Entropy →
Information content of events, the entropy formula H(X) = −∑ pᵢ log₂ pᵢ, its axiomatic derivation, properties, joint and conditional entropy, and the chain rule.
- Information content I(x) = −log₂ p(x)
- Entropy from axioms
- Properties: non-negativity, max, concavity
- Joint & conditional entropy
- Chain rule H(X,Y) = H(X) + H(Y|X)
Chapter 2: Source Coding Theorem →
Shannon's fundamental theorem on lossless compression: no code can compress below H(X) bits/symbol, yet this bound is asymptotically achievable.
- Kraft inequality
- Shannon's source coding theorem
- Typical sequences & AEP
- Proof sketch
- Preview of rate-distortion theory
Chapter 3: Channel Capacity →
Mutual information, the capacity of noisy channels (BSC, BEC), and the noisy channel coding theorem: reliable communication is possible at any rate below capacity.
- Mutual information I(X;Y)
- Channel capacity C = max I(X;Y)
- Binary symmetric channel
- Binary erasure channel
- Shannon's noisy channel coding theorem
Prerequisites
Part I requires only probability theory at the undergraduate level. You should be comfortable with:
- Discrete probability distributions, expectation, and variance
- Basic combinatorics and the binomial theorem
- Logarithms (natural and base-2) and their algebraic properties
- The definition of a limit and simple asymptotic notation