Beam Me Up, Decoder!
·47 words·1 min
The exponential divergence for autoregressive models should be solvable via better decoding algorithms. The decoder is in fact a “planner”. Beam search is in fact a local planner. All practical control systems have an inner loop that produces finite k steps, just like AR models. https://x.com/ylecun/status/1640122342570336267