Revenge of RL: The Cherry on ChatGPT's Cake

15 December 2022·46 words·1 min · Download pdf

GPT3 training objective doesn’t reconcile with zero shot task generalization in ChatGPT. Community seems to be converging on instruct paradigm as likely the difference maker. It’s Revenge of RL. A literal cherry on cake just like @ylecun predicted, but a very important cherry :). https://x.com/DrJimFan/status/1600884299435167745

Discussion