Algorithmic Information Theory
Kolmogorov
Complexity
The length of the shortest program that produces a string is its true information content. And you can never compute it.
Consider these two 100-character strings:
String A
01010101010101010101010101010101010101010101010101...
100 characters
String B
k8mP2xL9qN4jR7vT3wY6bH1dF5gK0sC8mU2pX9nJ4rV7tW3yB...
100 characters
Both strings are 100 characters long. But they are fundamentally different in their information content.
String A
print("01" * 50)Program: ~15 characters
String B
print("k8mP2xL9qN4...")Program: ~105 characters
The Kolmogorov complexity of a string
is the length of the shortest program that produces it.
K("0101...") ~ 15 bits. K("k8mP...") ~ 105 bits.
Patterns vs Randomness
Some strings contain patterns that allow compression. Others are incompressible - their shortest description is themselves.
Try it: Compression Visualizer
Raw String
ababababababab
Length: 14 characters
Bits needed: ~112 bits
Shortest Program
repeat("ab", 7)
Estimated complexity: ~2 characters
Compression: 86%
Low Complexity
- Repeating patterns
- Mathematical sequences
- Structured data
- Natural language text
Short programs generate long outputs
High Complexity
- Random noise
- Encrypted data
- Compressed files
- Most strings!
No shorter description than the string itself
Key insight: Most strings of length n have Kolmogorov complexity close to n. Random strings are the norm; compressible strings are rare.
Berry's Paradox
Here is where things get strange. Consider this phrase:
"The smallest positive integer not
definable in under sixty letters"
This phrase has 57 letters.
If such a number exists, we just defined it in under 60 letters - but it was supposed to be undefinable in under 60 letters!
Step Through the Paradox
The Setup
Consider all positive integers. Some can be described in English with very few words.
Example
"one" (3 letters), "twelve" (6 letters), "one million" (10 letters)
The connection to Kolmogorov complexity: Replace "definable in N letters" with "generatable by a program of N bits."
"The first string with Kolmogorov complexity greater than 1000 bits" - but this description itself is much shorter than 1000 bits!
Programs That Generate Strings
The same string can be generated by many programs. Kolmogorov complexity is defined by the shortest one.
Compare Program Lengths
0000...0000 (100 zeros)
Direct
print("0" * 100)Shortest
print("0"*100)Patterned strings compress well - short programs can generate long outputs.
Pi
1000 digits ~ 16 bits
Computable by short formula
Text
1000 chars ~ 600 bits
Language has redundancy
Random
1000 chars ~ 8000 bits
Incompressible
Why K(x) Is Uncomputable
Here is the deep result: there is no algorithm that computes Kolmogorov complexity. The proof uses self-reference, like Berry's paradox.
The Uncomputability Proof
Step 1: Assume K(x) is computable
COMPLEXITY(x)
Black box that computes K(x)
Suppose we have a program COMPLEXITY(x) that computes the Kolmogorov complexity of any string x.
What This Means
- 1.You can never know the true complexity of a string
- 2.You can find upper bounds (by finding short programs) but never prove optimality
- 3.The halting problem is embedded in complexity measurement
- 4."Is this string random?" is undecidable in general
Solomonoff Induction & Occam's Razor
Despite being uncomputable, Kolmogorov complexity provides the foundation for optimal prediction.
Ray Solomonoff showed that if you weight hypotheses by 2^(-K), you get a universal prior that is optimal for prediction in a precise mathematical sense.
Complexity and Prior Probability
Prior Probability by Complexity
Solomonoff's Universal Prior: The probability of a hypothesis is inversely related to its Kolmogorov complexity.
P(H) ~ 2^(-K(H))
The Deep Connection
This is why Occam's Razor works! Simpler hypotheses:
- Have lower Kolmogorov complexity
- Deserve higher prior probability
- Will dominate given sufficient evidence
- Make predictions with less overfitting
Implications
Defining Randomness
A string is random if its shortest description is roughly its own length. This is the only rigorous definition of randomness for individual strings.
Compression Limits
Most strings cannot be compressed. ZIP files work because real data has patterns. Random data (like encrypted files) cannot be made smaller.
Machine Learning
Minimum Description Length (MDL) approximates Kolmogorov complexity for model selection. Prefer models that compress the data best.
Fundamental Limits of Knowledge
Some questions about structure and randomness are forever undecidable. This is not a practical limitation but a mathematical impossibility.
The Paradox of Description
To describe the simplest description of something, you often need a description longer than the thing itself. Self-reference creates undecidability at the foundations of computation and information.
Information has an intrinsic complexity.
And we can never fully measure it.
Explore More Paradoxes
Kolmogorov complexity is just one window into the strange limits of computation and knowledge. Explore more mind-bending ideas.
References: Kolmogorov (1965), Solomonoff (1964), Chaitin (1966)