Skip to main content

Zipf Happens: Transformers Love Uneven Data

·34 words·1 min · Download pdf

Interesting paper: Transformers work so much better because they operate on “Zipfian” data. The emergent phenomenon and in-context learning do not appear if data didn’t had this property (for ex, iid data).

https://arxiv.org/abs/2205.05055

Discussion