Transformer Tweaks: No Magic Bullet
“In this paper, we comprehensively evaluate many of these modifications in a shared exper- imental setting that covers most of the common uses of the Transformer in natu- ral language processing. Surprisingly, we find that most modifications do not mean- ingfully improve perf” https://x.com/colinraffel/status/1440043262853537794