Weight Norm's Revenge: Equivalence with Batch Norm and L2 Reg
·53 words·1 min
This is a fantastic paper on equivalence between batch norm, weight decay (L2 reg) and weight norm. This insight then leads to making weight norm actual work for large networks and even LSTM, all with much less computation! https://arxiv.org/abs/1803.01814
I’d contacted authors for the code and they mentioned that its here: http://github.com/paper-submissions/norm_matters