Energy-Efficiency

VoltanaLLM: Feedback-Driven Frequency Control and Routing for Energy-Efficient LLM Serving

This blog presents the motivation, insights, and key optimizations behind VoltanaLLM, our system for energy-efficient LLM inference. We’ll walk through why energy matters, how conventional GPU frequency scaling falls short, the surprising behaviors we uncovered when profiling LLM serving, how P/D disaggregated serving creates unique opportunities, and how VoltanaLLM’s co-design of frequency control and routing achieves up to 36.3% GPU energy savings while maintaining near-perfect Service Level Objective (SLO) attainment.

September 14, 2025 9 minutes