One Ring to Train Them All: Scaling Transformers with Ring Attention
·42 words·1 min
Groundbreaking work! “Our experiments show that Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million… https://x.com/haoliuhl/status/1709630382457733596