Skip to main content

Cache Me If You Can: Giving Transformers a Memory

·48 words·1 min · Download pdf

This idea looks very cool: Extend transformer with large cache to store data at inference time (no weight change). One can then feed transformer with series of new facts which will be cached and used in subsequent inference. Memory is key missing piece in current architectures. https://x.com/ChrSzegedy/status/1503906876416798722

Discussion