Batch Norms? Who Needs Them! Deep Nets Shine Without BNs
Batch norms have enabled very deep networks but with problems. In past, some works succeeded training deep nets without BNs but with judicious init schemes although they showed poor generalization. Below is much more promising and works for complex archs like EfficientNet. https://x.com/ajmooch/status/1352614051352899585