Skip to main content

Architecture vs Auto-regressive Objective

·42 words·1 min · Download pdf

This new paper trains extremely simple linear(!) and shallow MLP networks to get competitive ppl on language modeling and 4 digit multiplication tasks! The claim seems to be that much of the magic is in auto-regressive objective, not the architecture.

https://arxiv.org/abs/2309.06979

Discussion