Skip to main content

Beam Me Up, Decoder!

·47 words·1 min

The exponential divergence for autoregressive models should be solvable via better decoding algorithms. The decoder is in fact a “planner”. Beam search is in fact a local planner. All practical control systems have an inner loop that produces finite k steps, just like AR models. https://x.com/ylecun/status/1640122342570336267

Discussion