Chinchilla's Lesson: Bigger Data Beats Bigger Models
The main insight here is that you can get same quality as 175B param model in 30B param model by increasing dataset size (per Chinchilla paper that showed GPT was not trained compute efficiently). The cost reduction for training then follows due to reduced compute. https://x.com/NaveenGRao/status/1575589170709291008