Linear Thinking: Simple Nets Tackle Language and Math
·42 words·1 min
This new paper trains extremely simple linear(!) and shallow MLP networks to get competitive ppl on language modeling and 4 digit multiplication tasks! The claim seems to be that much of the magic is in auto-regressive objective, not the architecture.