X-MoE: Scaling DeepSeek-style MoEs on Frontier—what broke, what we fixed, and what to learn
This blog presents the background and key optimizations behind X-MoE, along with our hands-on experience scaling MoE model training on Frontier, the AMD GPU supercomputer.
August 24, 20257 minutes