LLM Training Hits Hyperdrive: 10x Faster in a Year!
·43 words·1 min
While digging up some historical numbers, it hit me that LLM training is now ~10X faster than same time last year from tons of improvements like H100 availability, Flash Attention2, new kernels, torch.compile, CUDA graphs, FP8 etc!
That’s just past 12 months!!