RNN Strikes Back: SRU++ Overtakes Transformers
SRU++ is RNN with attention that beats Transformer-XL and Longformer with 5X reduced training time (same number of params). https://x.com/taolei15949106/status/1364980529007845381
SRU++ is RNN with attention that beats Transformer-XL and Longformer with 5X reduced training time (same number of params). https://x.com/taolei15949106/status/1364980529007845381