Chinchillas and Flashy Attention: How ChatGPT Got 10X Faster
·47 words·1 min
A lot of people are being surprised by ChatGPT inference cost being 10X smaller than GPT3 but the field had many many advances for past 2 years.
Top 2 are:
-
Chinchilla already showed 2.5X reduction in model size for GPT3.
-
Using FlashAttention adds another 5-6X gain.