Skip to main content

Command Palette

Search for a command to run...

The Idea That Can Measure Everything, Except Itself

Kolmogorov Complexity: What seems like a simple question about compression actually touches on the core limits of mathematics, shapes how we think about learning systems, and sparks surprisingly deep philosophical discussions.

Published
11 min read
The Idea That Can Measure Everything, Except Itself

Let's start with something small. Two strings of text:

String A:  ABABABABABABABABABABABABABABAB
String B:  4c1j5b2p0cv4w1x8rx2y39umgw5q
  

Both are 28 characters. Both take up the same space on a hard drive. By most conventional measures, they're equivalent. But intuitively, you already know they're not. String A has a pattern. You could describe it in one sentence: "AB repeated 14 times." That description is far shorter than the string itself.

String B? You’d essentially have to write it out directly - there’s no obvious shortcut. The shortest description of that string is the string.

The Core Idea, Stated Plainly

Formal Definition:

Let U be a fixed universal Turing machine(it is the "Universal Computer" that Kolmogorov's theory relies on). The Kolmogorov Complexity of a string x is:

Where:

  • p is a program (binary string),

  • ∣p∣ is its length,

  • U(p)=x means the program outputs x and halts.

Why the Machine Choice Doesn’t Matter Much

This means even if we change the underlying computer model, the complexity only changes by a fixed constant. So we can treat Kolmogorov Complexity as an inherent property of the string.

The Kolmogorov Complexity of an object: a string, a number, an image, any piece of information is defined as the length of the shortest computer program that outputs that object and then halts. If we call the object x and the complexity K(x):

Intuition, not actual syntax 

K("ABABABABABABABABABABABABABABAB")
  shortest program: print("AB" * 14)   approx.15 chars
     K(x) is LOW and there's structure here

K("4c1j5b2p0cv4w1x8rx2y39umgw5q")
  shortest program: print("4c1j5b2p0cv4w1x8rx2y39umgw5q")
     K(x) approx. length of x itself
     K(x) is HIGH and this is randomness

Low complexity indicates a simple pattern or explanation. High complexity means the string is its own shortest description. In information theory, a string with complexity equal to its length is called incompressible, which is the formal definition of random.

Basic Upper Bound

K(x) ≤ |x| +c

At worst, a program can simply print the string directly, so the complexity can never be much larger than the string itself.

The Key Shift

Before Kolmogorov, randomness was defined by process - a sequence was random if it was produced by a fair coin flip. But that means HHHHHHHHHHHH is technically "possible" under a fair coin. It just has low probability.

Kolmogorov gives us a structural definition instead: a sequence is random if no program can describe it shorter than the sequence itself. It doesn't matter how it was produced. What matters is what it is.

A Few Real Examples

Object Complexity K(x) Why
000000000000 Very low "N zeros" - one tiny program covers it.
3.14159265… Low-medium It's π - a short algorithm generates it forever.
k4xQ9mZ2pL7wR1s ≈ its own length No pattern, no shortcut. This is randomness.

The π example is interesting. Its digits appear random and pass randomness tests, yet π has low Kolmogorov Complexity since a simple algorithm can generate it. This illustrates the difference between statistical and Kolmogorov randomness; something can seem patternless but still have a simple structure.

A Fundamental Limitation: Uncomputability

Here's where it stops being a neat conceptual tool and becomes something uncomfortable.

Kolmogorov Complexity is uncomputable.

Connection to the Halting Problem

The uncomputability of Kolmogorov Complexity is closely tied to the Halting Problem.

The Halting Problem asks:
Given a program and an input, can we determine whether the program will eventually stop or run forever? Alan Turing proved that no algorithm can solve this problem for all possible programs. It is proven, mathematically and fundamentally, to be impossible to compute in general.

Suppose we had an algorithm that computes K(x). Then we could:

  1. Enumerate all programs of length ≤ n

  2. Run them and observe outputs

  3. Identify the shortest program that produces a given string

But this process requires knowing which programs halt. If we could compute Kolmogorov Complexity exactly, we would effectively solve the Halting Problem -which is impossible.

There lies the contradiction: the program is L characters long but claims to output a string with a shortest description longer than L. This is impossible since the program itself describes the string in L characters.

This resembles Berry's Paradox in algorithm form: "the smallest positive integer not definable in under thirteen words"-yet it's defined in twelve. It's about self-reference causing contradictions. Any system capable of fully measuring Kolmogorov Complexity ultimately contradicts itself.

"We can compute upper bounds on complexity. We can never verify the true minimum. Every compression is an approximation of something fundamentally out of reach."

~ Paraphrasing Li & Vitányi, An Introduction to Kolmogorov Complexity

What We Can Actually Do

Zip a file-the compressed size provides an upper bound on its Kolmogorov Complexity. Use a better algorithm for a tighter bound. But the true minimum? We can only approach it, never confirm it. Every compression algorithm is an attempt at something it can never fully achieve.

Incompressibility Theorem

K(x) ≥ n;

(for most strings of length n)

There are far fewer short programs than long strings. So most strings cannot be compressed - in fact, most strings are essentially random.

Where It Shows Up in the Real World

Biology - Your Genome Is a Compressed Program

The human genome encodes a 37-trillion-cell organism in about 750 megabytes, less than a standard HD movie. It describes the entire human body, from development to immune responses and neurological architecture, in a compact file.

Researchers in computational biology apply Kolmogorov Complexity to compare species' genomes, track evolution, and explore why diverse organisms share similar genetic complexity. By questioning "How compressible is this genome?" we gain insights into life.

Machine Learning - Compression as Understanding

One way to interpret learning is that a model discovers a shorter description of the data. Instead of memorizing every example, it captures patterns that allow it to represent the data more efficiently.
This idea is the basis of the Minimum Description Length (MDL) principle-choose the hypothesis that makes the total length of the model and the encoded data as short as possible. It's like a mathematical version of Occam's Razor.

Anomaly Detection - Finding What Doesn't Compress

In cybersecurity and network monitoring, normal traffic has structure - it compresses. Malicious or anomalous traffic often doesn't fit learned patterns and therefore compresses poorly relative to baseline. Systems that flag "this data is harder to describe than expected" are doing Kolmogorov reasoning in practice, even without calling it that. The idea leaks into engineering even when nobody's tracking it back to its source.

Physics - The Universe Has a Complexity Score

If you think physics can be fully calculated and the universe follows specific rules, then there should be a shortest program that explains the universe's entire history. Finding a Theory of Everything is like looking for the simplest way to describe all of physical reality.

And here's the sting: we can never verify we've found it. A simpler set of laws might produce identical observations. The uncomputability result means we'd have no way to prove our theory is actually the shortest. Every Grand Unified Theory is, formally speaking, an upper bound pretending to be an answer.

AIXI - What Happens When You Build AI Out of This

AIXI assigns higher weight to simpler environments:

Shorter programs (simpler explanations) are given exponentially higher importance, reflecting a formal version of Occam’s Razor.

The answer he arrived at was AIXI - a theoretical model of a perfect AI agent. Its core loop:

1 Observe the world Take in all sensory data and history of interactions so far.

2 Build the simplest model consistent with observations Using Kolmogorov Complexity to find the shortest program that explains everything seen so far.

3 Act to get the best future rewards based on that model
Pick actions that lead to the best results using the simplest model of the world.

4 Update. Repeat.
Each new observation improves the model. Keep compressing. Keep getting better.

AIXI can be mathematically proven to be optimal - no other agent can consistently outperform it across all possible environments. It is the formal theoretical ceiling of intelligence itself.

It cannot be built due to Step 2 requiring the computation of Kolmogorov Complexity, which is uncomputable. AIXI is a provably optimal agent that cannot exist. It's the most useful concept in AI theory that will never run on any machine.

Why AIXI Still Matters Despite Being Uncomputable

AIXI defines the ceiling. Every real AI system — every large language model, every reinforcement learning agent, every decision tree — is a computable approximation of something that, in its perfect form, is grounded in Kolmogorov's idea.

When researchers design smarter AI, they are in a precise sense trying to close the gap to AIXI while staying within the bounds of what a computer can actually execute. The uncomputable ideal isn't a dead end. It's a compass pointing at something we can keep getting closer to, even if we can never fully arrive.

The Big Debate - Is Compression the Same as Understanding?

This is where the idea becomes genuinely controversial — and where serious researchers take hard, opposing positions.

If intelligence is compression, and compression is what our best AI systems do - finding compact representations, predicting structure, generalising from patterns — then maybe a model that compresses language well enough is, in some meaningful sense, actually understanding it. The debate:

The Compression Camp

Understanding is finding the shortest description. To understand something is to see the pattern beneath the noise. GPT-4 "understands" English because it has found an extraordinarily compact representation of its statistical structure. Intelligence is what good compression looks like from the outside.

The Meaning Camp

A ZIP file compresses a novel but understands nothing. Kolmogorov Complexity measures structural regularity, yet meaning, reference, and intentionality are entirely different. While compression might be necessary for intelligence, it is far from sufficient. The real challenge lies in what the compression represents.

No side has prevailed. Schmidhuber, a key figure in deep learning, has long claimed that curiosity, creativity, and intelligence all come down to seeking compression. The other side thinks this view oversimplifies and misses the main point. The debate is a mix of computer science, philosophy of mind, and AI alignment, and it remains unresolved.

There's a final implication of all this that rarely makes it into the textbooks, but probably should.

Science is about finding simpler ways to describe the world. Every theory, model, and equation tries to make observations shorter and more general. Newton's laws sum up the motion of all objects in three sentences. Maxwell's equations cover classical electromagnetism in four. General relativity explains spacetime curvature with one compact equation. This is Kolmogorov thinking. Science is compression in action.

But here's what the uncomputability result tells us that we tend to quietly ignore: we can never know if any of our theories are actually the shortest description. A simpler set of equations might produce every observation we've ever made. We would have no way to prove it doesn't exist. Every Theory of Everything is, mathematically speaking, an upper bound on the complexity of the universe - not a proof that we've reached the minimum.

We have Occam's Razor - prefer the simpler explanation. We also have a mathematical proof that we can never confirm which explanation is truly simplest. Kolmogorov gives you the razor and removes the certainty of knowing when to stop shaving.

The shortest program that outputs the universe exists - in principle.
We just can never prove we've found it.

Every theory is an upper bound.
Every model is an approximation of something unreachable.
Every answer is a best guess at a minimum we can approach but never touch.

Which is either the most humbling thing in the history of human thought -
or the most compelling reason to keep going.

Probably both

Further reading: Li & Vitányi - An Introduction to Kolmogorov Complexity and Its Applications · Marcus Hutter's AIXI paper · Schmidhuber on compression and creativity · Chaitin on algorithmic information theory