Matmul and Strassen's Algo 26 November 2023·39 words·1 min · Download pdf Very succinct explanation of matmul at hardware level. This should explain why strassen’s algo is never used in accelerators. Also, why you need many hyper parameters such as vocab size, number of heads etc in multiple of 2^k. Discussion