Skip to main content

Paperception: Unveiling Post-Hoc EMA Tuning

·43 words·1 min

Here is a paper hidden inside a paper! As many folks doing LLM training knows EMA can improve models significantly but tuning EMA is hard because runs themselves are very expensive. Here authors have figured out how to do it post-hoc! https://arxiv.org/abs/2312.02696

Discussion