DeepSeek's Recipe Evolution

27 January 2025·46 words·1 min · Download pdf

Interestingly DeepSeek has been using RL for reasoning with GRPO all the way since early 2024 but results weren’t as impressive.

So what changed?

And boom!