Skip to main content

Transformer Tweaks: No Magic Bullet

·45 words·1 min · Download pdf

“In this paper, we comprehensively evaluate many of these modifications in a shared exper- imental setting that covers most of the common uses of the Transformer in natu- ral language processing. Surprisingly, we find that most modifications do not mean- ingfully improve perf” https://x.com/colinraffel/status/1440043262853537794

Discussion