Revenge of RL: The Cherry on ChatGPT's Cake
GPT3 training objective doesn’t reconcile with zero shot task generalization in ChatGPT. Community seems to be converging on instruct paradigm as likely the difference maker. It’s Revenge of RL. A literal cherry on cake just like @ylecun predicted, but a very important cherry :). https://x.com/DrJimFan/status/1600884299435167745