Skip to main content

One Ring to Train Them All: Scaling Transformers with Ring Attention

·42 words·1 min

Groundbreaking work! “Our experiments show that Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million… https://x.com/haoliuhl/status/1709630382457733596

Discussion