PaLM2-L's Token Diet: Outsmarting Chinchilla and LLaMA
·48 words·1 min
Below leaked PaLM numbers are bit unusual. It points to new scaling law paper eluded to.
-
Chinchilla optimal tokens for 340B is 8.7T.
-
LLaMA optimal tokens is at least 26T.
This means PaLM2-L needed 2.5X less tokens by Chinchilla standards and even less by LLaMA standards. https://x.com/ml_hardware/status/1658936724943142913