Skip to main content

Learning Rate Hydra: Eliminating One Hyperparameter Breeds More

·25 words·1 min · Download pdf

It seems that every effort to eliminate one sgd hyperparamter eventually has produced more hyperparameters. Learning rate is like Hydra’s head in deep learning.

Discussion