Skip to main content

Learning Rate Warmup: The Hot New Stabilizer

·22 words·1 min · Download pdf

“learning rate warmup can improve training stability just as much as batch normalization, layer normalization, MetaInit, GradInit, and Fixup initialization” https://x.com/Arxiv_Daily/status/1448198932949921801

Discussion