Skip to main content

Weight Norm's Revenge: Equivalence with Batch Norm and L2 Reg

·53 words·1 min

This is a fantastic paper on equivalence between batch norm, weight decay (L2 reg) and weight norm. This insight then leads to making weight norm actual work for large networks and even LSTM, all with much less computation! https://arxiv.org/abs/1803.01814

I’d contacted authors for the code and they mentioned that its here: http://github.com/paper-submissions/norm_matters

Discussion