The OPT Disasterpiece: LLM Training Gone Wrong
·49 words·1 min
I haven’t read a paper with more chaos and disasters while training LLM than OPT. It could be because they bravely chose to add those details but my guess is rather some of the bad choices they made like init, linear LR schedule, ReLUs, buggy loss scaling etc.