Skip to main content

Linear Thinking: Simple Nets Tackle Language and Math

·42 words·1 min

This new paper trains extremely simple linear(!) and shallow MLP networks to get competitive ppl on language modeling and 4 digit multiplication tasks! The claim seems to be that much of the magic is in auto-regressive objective, not the architecture.

https://arxiv.org/abs/2309.06979

Discussion