Skip to main content

PaLM2-L's Token Diet: Outsmarting Chinchilla and LLaMA

·48 words·1 min

Below leaked PaLM numbers are bit unusual. It points to new scaling law paper eluded to.

  • Chinchilla optimal tokens for 340B is 8.7T.

  • LLaMA optimal tokens is at least 26T.

This means PaLM2-L needed 2.5X less tokens by Chinchilla standards and even less by LLaMA standards. https://x.com/ml_hardware/status/1658936724943142913

Discussion