Big Models Love Tech Data: Scaling Laws Strike Again

2 January 2021·43 words·1 min · Download pdf

Dataset for training language models using 22 sources. Great to see technical sources like pubmed, arxiv, stackexchange, GitHub. Models trained on common crawl don’t perform as good on technical language but surprisingly scaling law still applies, i.e., larger models do better. https://x.com/nabla_theta/status/1345130408170541056

Discussion