Skip to main content

Matmul Antics: Powers of Two and Strassen's Exile

·40 words·1 min

Very succinct explanation of matmul at hardware level. This should explain why strassen’s algo is never used in accelerators. Also, why you need many hyper parameters such as vocab size, number of heads etc in multiple of 2^k. https://x.com/olafwillocx/status/1728707653772456135

Discussion