Big Models Love Tech Data: Scaling Laws Strike Again
Dataset for training language models using 22 sources. Great to see technical sources like pubmed, arxiv, stackexchange, GitHub. Models trained on common crawl don’t perform as good on technical language but surprisingly scaling law still applies, i.e., larger models do better. https://x.com/nabla_theta/status/1345130408170541056