Learning Rate Hydra: Eliminating One Hyperparameter Breeds More
It seems that every effort to eliminate one sgd hyperparamter eventually has produced more hyperparameters. Learning rate is like Hydra’s head in deep learning.
It seems that every effort to eliminate one sgd hyperparamter eventually has produced more hyperparameters. Learning rate is like Hydra’s head in deep learning.