Paperception: Unveiling Post-Hoc EMA Tuning
·43 words·1 min
Here is a paper hidden inside a paper! As many folks doing LLM training knows EMA can improve models significantly but tuning EMA is hard because runs themselves are very expensive. Here authors have figured out how to do it post-hoc! https://arxiv.org/abs/2312.02696