Skip to main content

Chinchilla's Lesson: Bigger Data Beats Bigger Models

·47 words·1 min · Download pdf

The main insight here is that you can get same quality as 175B param model in 30B param model by increasing dataset size (per Chinchilla paper that showed GPT was not trained compute efficiently). The cost reduction for training then follows due to reduced compute. https://x.com/NaveenGRao/status/1575589170709291008

Discussion