
A modular, plug-and-play framework for high-level LLM inference optimizations. Cache Saver uses a namespace-aware list-valued cache to ensure statistical integrity and reproducibility, reducing inference costs by ~25% and CO2 emissions by ~35% on average.
Research Goal
Efficient and reproducible LLM inference


