Skip to main content

Chinchillas and Flashy Attention: How ChatGPT Got 10X Faster

·47 words·1 min

A lot of people are being surprised by ChatGPT inference cost being 10X smaller than GPT3 but the field had many many advances for past 2 years.

Top 2 are:

  1. Chinchilla already showed 2.5X reduction in model size for GPT3.

  2. Using FlashAttention adds another 5-6X gain.

Discussion