Deep Learning's Log-Linear Scaling Secrets
Deep Learning Scaling is Predictable, Empirically - https://arxiv.org/abs/1712.00409
-
Test loss reduces log linearly with training data size.
-
Model parameters needs to be increased log linearly with training data size.
Roughly, ResNet parameters in millions = sqrt(data size)*2.6