Algorithmic Complexity
What does it mean for a string to be random? Kolmogorov, Chaitin, and Rissanen answer with the shortest program, the halting probability, and the best model.
Part Overview
Shannon entropy measures the average information content of a source — a probability distribution over many outcomes. But what about the information content of a single string? The sequence 0101010101... and a truly random bit string of the same length have the same Shannon entropy under a fair-coin model, yet the first is clearly structured while the second is not.
Kolmogorov complexity (independently discovered by Kolmogorov, Solomonoff, and Chaitin in the 1960s) resolves this by defining the complexity of a string \(x\) as the length of the shortest program that outputs \(x\):\(K(x) = \min\{|p| : U(p) = x\}\). A random string has no short description — its complexity equals its length.
The catch: \(K(x)\) is uncomputable. There is no algorithm that, given \(x\), outputs \(K(x)\). This is not a practical limitation but a fundamental one, rooted in the halting problem. Chaitin exploited this to give an information-theoretic proof of Gödel’s incompleteness theorems.
The Minimum Description Length (MDL) principle operationalizes these ideas for statistics: given competing models for data, choose the one that minimizes the total description length of model plus data. MDL provides a rigorous, Bayesian-free foundation for model selection and is deeply connected to both Kolmogorov complexity and Bayesian inference.
Central Concepts
Kolmogorov Complexity
\( K(x) = \min_{U(p)=x}|p| \)
MDL Score
\( L(M) + L(\text{data}|M) \)
Chaitin’s Omega
\( \Omega = \sum_{U(p)\downarrow} 2^{-|p|} \)
Complexity vs Compressibility
Chapters
Chapter 10: Kolmogorov Complexity
The length of the shortest program that outputs a given string. Measures the intrinsic randomness of individual objects, independent of probability distributions. Uncomputable, yet profound.
Chapter 11: Minimum Description Length
Rissanen's MDL principle: the best model for data is the one that compresses it most. A rigorous formulation of Occam's razor that connects information theory, statistics, and model selection.
Chapter 12: Gödel, Turing & Information
Chaitin's information-theoretic proof of Gödel's theorem. The halting problem implies K(x) is uncomputable. Omega — the probability that a random program halts — is a maximally random real number.
Connection to Shannon Entropy
The two notions of information are deeply related. For an i.i.d. source with distribution \(p\), the expected Kolmogorov complexity of a string of length \(n\) satisfies:
\[ \mathbb{E}[K(X^n)] \approx n\,H(X) + O(\log n) \]
Shannon entropy governs the average complexity; Kolmogorov complexity measures the complexity of individual strings. The two converge in expectation but can differ wildly for specific strings.
Prerequisites
- ‣Part I (Shannon entropy, source coding, data compression)
- ‣Basic computability theory: Turing machines, halting problem (for Ch 10, 12)
- ‣Elementary statistics: model fitting, likelihood, polynomial regression (for Ch 11)