Vocab Sizes: From Guilty Tens to Speedy Sixty-Fours
·55 words·1 min
I am guilty of setting vocab sizes in multiples of 10. It turns out changing this to multiple of 64 gives whooping 25% speedup! My guess is that if these numbers are not aligned with GPU arch numbers, which are often powers of 2, 16 or 64, then you leave some hardware idle. https://x.com/karpathy/status/1621578354024677377