One Ring to Train Them All: Scaling Transformers with Ring Attention
    
    
      ·42 words·1 min
    
    
    
  
  
  
    
  
        Groundbreaking work! “Our experiments show that Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million… https://x.com/haoliuhl/status/1709630382457733596