Hailstorm: balancing the load in LSM-based distributed databases

Hailstorm is a system designed to improve load balance and resource utilization in existing LSM-based distributed databases. Hailstorm is deployed as a filesystem underneath LSM storage engines such as RocksDB and spreads all data uniformly across all storage devices within a rack in fine-grained blocks. We subsequently leverage this storage design to enable database instances with a high load to offload compaction tasks to other less utilized machines in the system. Hailstorm improves throughput by 2X for write-heavy workloads in MongoDB, 22X for range scans, and provides 50% performance improvements in TPC-C and TPC-E on TiDB.

For more details, you can find the paper here.