Skip to main content

Pretraining has Gotten 10x Faster in Past Year!

·43 words·1 min · Download pdf

While digging up some historical numbers, it hit me that LLM training is now ~10X faster than same time last year from tons of improvements like H100 availability, Flash Attention2, new kernels, torch.compile, CUDA graphs, FP8 etc!

That’s just past 12 months!!

Discussion