The OPT Disasterpiece: LLM Training Gone Wrong

19 March 2023·49 words·1 min · Download pdf

I haven’t read a paper with more chaos and disasters while training LLM than OPT. It could be because they bravely chose to add those details but my guess is rather some of the bad choices they made like init, linear LR schedule, ReLUs, buggy loss scaling etc.

Discussion