Learning Rate Hydra: Eliminating One Hyperparameter Breeds More
·25 words·1 min
It seems that every effort to eliminate one sgd hyperparamter eventually has produced more hyperparameters. Learning rate is like Hydra’s head in deep learning.