Skip to main content

Chinchilla's 2% Solution: Compressing Text with Tiny Models

·67 words·1 min · Download pdf

It occurs to me that Chinchilla scaling law can also be interpreted as optimal compute neural compression law.

That is, it can be re-stated as:

To compress K bytes of text (by certain optimal lossy criteria), model capacity of K/50 bytes is required.

I find above form more…

Better numbers are at https://bellard.org/nncp/

For enwiki9, gzip only achieves 68% compression while transformers achieves 88%, both lossless.

Discussion