Skip to main content

Self-Attention Slimdown: From O(N²) to O(N·M), But Is It Worth It?

·23 words·1 min · Download pdf

Interesting take on reducing self-attention complexity from O(N^2) to O(N*M) although results don’t look eye popping compared to SOTA like DeiTs. https://x.com/_akhaliq/status/1362221635571433481

Discussion