I recently got inspired by a podcast episode “Searching For Meaning In Randomness” (from The Rest Is Science). It sparked an idea "what happens when you try to compress something that is truly random?"
Here’s the intuition they discussed, but in different context:
Suppose you roll a die six times and get the same number every time: 6, 6, 6, 6, 6, 6. That’s easy to describe and compress. You could just say, “I rolled six sixes”.
Now suppose the rolls are completely random: 3, 1, 5, 2, 6, 4. There’s no pattern. Each piece of information is independent and need to be delivered individually. There’s nothing to compress.
I decided to test this idea against actual compression algorithms. I generated a 1 MB file of cryptographically random bytes (essentially maximum entropy) and ran a few common compressors: gzip, xz, zstd, and brotli.
Here are the results:
Generating high-entropy file...
Size: 1.00 MB (1,048,576 bytes)
Output: output.bin
Successfully generated 1,048,576 bytes of high-entropy data
Output file: output.bin
Compression comparison (entropy vs lossless at max level):
Original size: 1,048,576 bytes (1.00 MB)
gzip 1,048,925 bytes ratio 1.0003 -> output.bin.gz
xz 1,048,688 bytes ratio 1.0001 -> output.bin.xz
zstd 1,048,613 bytes ratio 1.0000 -> output.bin.zst
brotli 1,048,584 bytes ratio 1.0000 -> output.bin.br
Entropy wins: best compressed size >= original (compression useless)
Exactly as expected! The file didn’t shrink at all. In fact, some compressors added a tiny bit of overhead metadata.
This is a simple but powerful demonstration of why compression depends on patterns. If your data is fully random (maximum entropy), there’s literally nothing to exploit. Lossless compression can only reorganize repeated structures.
It’s also a neat mental model:
- Repetitive, predictable data = compressible.
- Truly random data = incompressible.
It might seem obvious in hindsight, but actually generating a fully random file and watching compressors fail in real-time drives the point home.