Part IV β€Ί Chapter 10

Kolmogorov Complexity

The information content of an individual string β€” defined as the length of its shortest description. Uncomputable, yet the foundation of algorithmic information theory.

10.1 Definition

Fix a universal Turing machine \(U\). The Kolmogorov complexity (also called algorithmic or descriptional complexity) of a binary string \(x\) is:

\[ K(x) = \min\bigl\{|p| : U(p) = x\bigr\} \]

length of shortest binary program that makes \(U\) output \(x\)

Intuitively: how many bits are needed to describe \(x\) completely? A string like \(\underbrace{00\cdots0}_{1000}\) can be described by a short program (β€œprint 1000 zeros”), so \(K(x) \ll |x|\). A genuinely random string has no shorter description than itself, so \(K(x) \approx |x|\).

Prefix-free version: One often uses the prefix-free Kolmogorov complexity \(K^*(x)\) where only self-delimiting programs are allowed (analogous to prefix-free codes). This satisfies the Kraft inequality\(\sum_x 2^{-K^*(x)} \le 1\), making it closer to a probability measure.

10.2 Invariance Theorem

Does \(K(x)\) depend on the choice of reference machine \(U\)? The Invariance Theorem says no β€” up to a constant:

\[ K_U(x) \le K_V(x) + c_{U,V} \]

where \(c_{U,V}\) is a constant depending only on the two machines, not on \(x\)

Proof: Given a program \(p_V\) that computes\(x\) on \(V\), we can prepend a fixed interpreter \(I_{V\to U}\)that simulates \(V\) on \(U\): the combined program \(I_{V\to U} p_V\)runs on \(U\) and has length \(|I_{V\to U}| + |p_V|\) where\(|I_{V\to U}|\) is the fixed constant \(c_{U,V}\).

The invariance theorem justifies speaking of the Kolmogorov complexity β€” it is well-defined up to a fixed additive constant. For large strings, this constant is negligible.

10.3 Uncomputability

Theorem: There is no algorithm that computes\(K(x)\) for all \(x\).

Proof (Berry paradox variant): Suppose\(K\) were computable. Consider the string:

β€œthe lexicographically first string \(x\) with \(K(x) > n\)”

This description has length \(O(\log n)\) bits (encoding \(n\)) β€” but by definition the string has \(K > n\). For large enough \(n\),\(O(\log n) < n\), a contradiction. Hence \(K\) is uncomputable.

Practically: we can approximate \(K(x)\) from above by compression algorithms (zlib, bz2, lzma). These give upper bounds: if a compressor produces a \(k\)-byte output, then \(K(x) \le k + c\) where \(c\) is the decompressor size.

10.4 Relation to Shannon Entropy

For a random variable \(X\) drawn i.i.d. with distribution \(P\):

\[ \mathbb{E}[K(X)] = H(X) + O(1) \]

More precisely, for a sequence of \(n\) i.i.d. samples:

\[ \frac{1}{n}\mathbb{E}[K(X^n)] \xrightarrow{n\to\infty} H(X) \]

Shannon Entropy

  • β€£ Defined for probability distributions
  • β€£ Average over many outcomes
  • β€£ Computable
  • β€£ Distribution-dependent

Kolmogorov Complexity

  • β€£ Defined for individual strings
  • β€£ Complexity of a single object
  • β€£ Uncomputable (only approximable)
  • β€£ Distribution-free / objective

10.5 Random Strings

A string \(x\) of length \(n\) is called algorithmically random (orKolmogorov random) if \(K(x) \ge n - O(1)\) β€” there is no significantly shorter description.

Key facts:

  • β€£At least \(2^n - 2^{n-c+1}\) strings of length \(n\) have\(K(x) \ge n-c\). Almost all strings are random.
  • β€£Random strings have no exploitable pattern: they pass all effective statistical tests. Martin-LΓΆf randomness coincides with Kolmogorov randomness.
  • β€£You can never prove a specific string is Kolmogorov random (uncomputability), yet almost all strings are. This is analogous to GΓΆdel incompleteness.

Python: Estimating Complexity via Compression

Uses zlib, bz2, and lzma as upper-bound estimators for \(K(x)\). Tests 8 string types (zeros, alternating, Fibonacci word, powers of 2, English text, source code, random bits, random DNA) of length 1000. Four plots: compressed sizes, compression ratios, K vs length for random/structured, and distributions of K over 500 strings of each type.

Python
script.py195 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server