Skip to main content

Big Brains: When AI Levels Up with Scale

·119 words·1 min · Download pdf

This is going to be a milestone paper. Models are able to solve certain tasks only when they go beyond certain size and training compute! Authors identify many tasks and measure the size+compute where ability to solve those tasks suddenly starts emerging. https://arxiv.org/abs/2206.07682

For example, model’s ability to do 9 digit out of domain sum doesn’t emerge until ∼1.3x10^20 training FLOPs (100M parameters). This is not to say better architectures can bring this down.

What remains to be seen is whether large models are accommodating much longer tail to simply whiz through more expansive test sets and thus create an appearance of “new ability” that existed at smaller scale in smaller models if test set was less expansive.

Discussion