Chinchilla's Lesson: Bigger Data Beats Bigger Models

30 September 2022·47 words·1 min · Download pdf

The main insight here is that you can get same quality as 175B param model in 30B param model by increasing dataset size (per Chinchilla paper that showed GPT was not trained compute efficiently). The cost reduction for training then follows due to reduced compute. https://x.com/NaveenGRao/status/1575589170709291008

Discussion