Double or Nothing: Scaling Models and Tokens Equally

22 April 2022·36 words·1 min · Download pdf

“we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.” https://x.com/papers_daily/status/1517077669833318406

Discussion