Matmul Antics: Powers of Two and Strassen's Exile
·40 words·1 min
Very succinct explanation of matmul at hardware level. This should explain why strassen’s algo is never used in accelerators. Also, why you need many hyper parameters such as vocab size, number of heads etc in multiple of 2^k. https://x.com/olafwillocx/status/1728707653772456135