Skip to main content

Revenge of RL: How Instruct Paradigm Boosts ChatGPT

·45 words·1 min · Download pdf

GPT3 training objective doesn’t reconcile with zero shot task generalization in ChatGPT. Community seems to be converging on instruct paradigm as likely difference maker. It’s Revenge of RL. A literal cherry on cake just like @ylecun predicted, but a very important cherry :). https://x.com/DrJimFan/status/1600884299435167745

Discussion