Pretraining has Gotten 10x Faster in Past Year!

17 April 2024·43 words·1 min · Download pdf

While digging up some historical numbers, it hit me that LLM training is now ~10X faster than same time last year from tons of improvements like H100 availability, Flash Attention2, new kernels, torch.compile, CUDA graphs, FP8 etc!

That’s just past 12 months!!

Discussion