Transformer Telepathy: Estimating Flops with Just a Glance
How do you estimate flops, latency and memory footprint of a transformer model just looking at the architecture? Transformer inference arithmetic is a great post on how:
How do you estimate flops, latency and memory footprint of a transformer model just looking at the architecture? Transformer inference arithmetic is a great post on how: