FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting

ACM MM 2025

Boyuan Tian, Qizhe Gao, Siran Xianyu, Xiaotong Cui, Minjia Zhang

University of Illinois Urbana-Champaign

Novel views synthesized from our compressed versus uncompressed 3DGS models — Bicycle (MipNeRF-360).

Novel views synthesized from our compressed versus uncompressed 3DGS models — Garden (MipNeRF-360).

Compressing 3DGS Can Be Lightweight, Flexible, and Mobile-Ready — No Training Required

3D Gaussian Splatting (3DGS) has rapidly become the de facto candidate for representing 3D scenes. It strikes a rare balance—offering both exceptional visual quality than meshes, voxels, or surfels, and real-time rendering speeds that outperform NeRF-like methods. Its success is evident in applications like novel view synthesis, avatars, animation, and physics simulations, where high fidelity and speed are crucial.

But this power comes with a catch: massive file sizes.

Because 3DGS explicitly models the physical world by fitting it with millions of 3D Gaussians—each treated as a point with view-dependent attributes like color, orientation, and scale—the resulting model becomes a specialized, attribute-rich point cloud. Each point is an oriented ellipse, splatted into the target view and blended in the right order—conceptually similar to triangle rasterization in traditional graphics.

These large files are problematic, especially for edge devices with limited storage and memory, and for scenarios involving network transmission, where bandwidth and latency matter.

The Problem: Compression That Doesn’t Scale

Compressing 3DGS models is a natural solution—and researchers have explored techniques like:

  • Pruning
  • Quantization
  • Entropy coding
  • Compact data structures (e.g., hash tables and grids)

While effective, most existing approaches depend heavily on retraining or refinement steps to restore lost quality. These steps are resource-intensive and often require specialized software frameworks or hardware setups, making them impractical for large-scale or high-resolution scenes (see Fig. 1).

Scalability challenges in training 3D Gaussian Splatting models
Figure 1: Scaling 3D Gaussian Splatting to large, high-resolution scenes pushes both software and hardware limits. Training pipelines have evolved from single-GPU setups (e.g., 3DGS) to distributed systems (e.g., Grendel-GS, RetinaGS), while Gaussian counts and pixel budgets per scene have increased 20–100× — making training-involved compression hard to scale.

Even worse, if you want a different compression ratio or quality target, you typically need to tweak parameters and rerun the pipeline from scratch—an inflexible and time-consuming process, especially when dealing with fluctuating bandwidth or diverse user hardware capabilities.

Can We Compress Without Training?

What if we eliminate training altogether?

This is challenging—no learning means no opportunity for a model to discover compact representations or compensate for quality loss. Inspired by zero-shot techniques used in language model compression, we experimented with naive pruning and scalar quantization. However, they struggled to effectively reduce redundancy while preserving quality.

Then came a key insight: parameter importance in 3DGS is highly skewed—not just across entire Gaussian primitives, but across their individual attributes.

For example:

  • Some 3D Gaussians cover large areas with high opacity, making them critical to preserve, while others contribute little to the final image.
  • Some attributes (like position) have high visual impact, while others (like minor color variations) are more tolerant to errors.

Unfortunately, prior methods either ignored these varying sensitivities, applying the same bit-widths or pruning across the board, or required excessive retraining or refinement to learn.

We found that while importance for Gaussian primitives is scene-dependent, the error sensitivity pattern across attributes is largely scene-agnostic, enabling a middle-ground solution that avoids retraining altogether.

FlexGaussian: Training-Free Compression with Attribute-Sensitivity-Aware Pruning and Quantization

FlexGaussian compresses pre-trained 3DGS models using:

  • Pruning: selectively removing entire or partial Gaussians based on importance
  • Quantization: applying round-to-nearest (RTN) scalar quantization to each attribute, using different bit-widths depending on tolerance to error

We introduce two pruning parameters:

  • P_row – controls row-wise pruning
  • P_sh – controls shape-based pruning

Both strategies individually achieve ~6× compression with only 0.2–0.4 dB drop in PSNR, depending on the scene. However, combining pruning and quantization is tricky:

  • It amplifies compression, but also compounds quality loss
  • While attribute importance is consistent across scenes, Gaussian importance is scene-dependent, meaning that the same P_row and P_sh settings can result in very different outcomes

Here’s the advantage of our training-free approach: even though optimal parameters vary per scene, evaluating them is fast and cheap.

Through profiling, we observed that only a limited number of pruning-quantization combinations lie on the Pareto-optimal frontier for quality and compression. This allows us to search efficiently over this small set to find the best setting. We call this system FlexGaussian.

How FlexGaussian Works

FlexGaussian starts with a pre-trained 3DGS model and a user-specified compression target—either a quality threshold (e.g., max PSNR drop) or a file size goal.

Put it together, FlexGaussian works with the below pipeline (as shown in Fig. 2):

  1. Compute importance scores for each Gaussian primitive
    • This step is modular and can be replaced with future improvements
  2. For each candidate pair of pruning parameters (P_row, P_sh):
    • Apply pruning
    • Apply scalar quantization with attribute-specific bit-widths
    • Dequantize and render the scene
    • Evaluate quality (e.g., PSNR)
  3. Loop until a valid solution is found or all combinations are exhausted
Figure 2: FlexGaussian first applies attribute-discriminative pruning (ADP) to obtain semi-structurally pruned Gaussian primitives. It then employs INT8/INT4 channel-wise mixed-precision quantization (MPQ) to further reduce the model size along the bit dimension. Finally, FlexGaussian introduces a novel lightweight online adaptation (FOA) algorithm to adaptively adjust the compression ratio for diverse scenes on hardware with different efficiency constraints.

This full loop—prune → quantize → render → evaluate—takes just 1–2 seconds, enabling real-time adaptation to different compression needs without training (see Fig. 3).

FlexGaussian compression pipeline
Figure 3: The execution pipeline prunes low-importance primitives, applies RTN-based quantization and dequantization with attribute-specific bit-widths, renders the output, and evaluates quality—completing the loop in 1–2 seconds per step. Each step corresponds to a different compression ratio and quality drop, enabling flexible, training-free compression under diverse requirements.

Results: Fast, Flexible, and Powerful

We evaluated FlexGaussian across multiple benchmarks:

  • Mip-NeRF360, Tanks & Temples, Deep Blending

We compared against:

  • Retraining-based methods
  • Refinement-based methods
  • FCGS – the only known training-free state-of-the-art (SOTA)

On a desktop with NVIDIA RTX 3090, FlexGaussian achieved:

  • Up to 96.4% file size reduction
  • <1 dB quality loss
  • 1.7–2.1Ă— faster performance than FCGS
  • Orders of magnitude faster than training-based approaches

Quality comparisons are shown in the interactive zoom-in results and rendered video. Most notably, FlexGaussian's training-free, low-cost design makes it deployable on mobile-class devices like Jetson AGX Xavier, expanding its usability beyond desktops and into real-world edge deployments.

Closing Thoughts

FlexGaussian strikes a powerful balance between compression performance, visual quality, and computational cost—all without retraining. It offers a fast, flexible, and scalable solution for compressing 3D Gaussian Splatting, paving the way for deployment across distributed edge devices and enabling efficient 3DGS streaming.

Qualitative Comparisons

If the images are not displayed correctly, please try reloading the page.

GroundTruth

Ours

3DGS

GroundTruth

Ours

3DGS

GroundTruth

Ours

3DGS

GroundTruth

Ours

3DGS

GroundTruth

Ours

3DGS

GroundTruth

Ours

3DGS

BibTeX

@inproceedings{FlexGaussian,
        title={FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting},
        author={Tian, Boyuan and Gao, Qizhe and Xianyu, Siran and Cui, Xiaotong and Zhang, Minjia},
        booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
        year={2025}
}