Skip to main content

Phixtral Fusion: Phi-2 and Pre-trained Experts Smash the Leaderboard

·73 words·1 min

It’s simply amazing how OSS community is using Phi-2. Goddard figured out that you can just slap in pre-trained models as “experts’ in Mixtral. For routing, directly compute gate matrix using hidden state for expert’s prompt.

Phixtral smashes the leaderboard. No extra training!! https://x.com/maximelabonne/status/1744867841436700850

Shout out to fun article by Charles Goddard on how this strange “MoE” works.

Most folks will immediately realize this as ensembles with one weird trick :).

https://goddard.blog/posts/clown-moe/#moe-gates-without-training

Discussion