X-MoE: Scaling DeepSeek-style MoEs on Frontier—what broke, what we fixed, and what to learn
2025-08-24
This blog presents the background and key optimizations behind X-MoE, along with our hands-on experience scaling MoE model training on Frontier, the AMD GPU supercomputer.