The use of machine learning to tune data management systems or synthesize components tailored to a specific problem instance has recently become a popular research direction. Such “learned” systems can automatically adapt to new workloads, data, or hardware without time-consuming tuning by humans, thereby dramatically reducing the cost and accessibility of data analytics. However, although learned systems and components have shown orders-of-magnitude performance improvements under laboratory conditions, it remains largely unclear how these numbers will hold up in more realistic production environments. In particular, traditional benchmarks such as TPC or YCSB that evaluate performance under a stable workload and data distribution are insufficient to characterize these new systems due to the latter’s ability to overfit to the benchmark. As a result, companies are often reluctant to incorporate learned techniques in mainstream systems due to a lack of evidence of how they would perform under varying conditions.
This work proposes new ideas for benchmarking learned systems. New benchmarks should abstain from using fixed workloads and data distributions as their characteristics are easy to learn. Similarly, they should strive to measure adaptability through descriptive statistics and outliers rather than average metrics that “hide” too much information. We also proposed techniques and metrics to incorporate model training and cost savings into benchmark results, two key characteristics that can no longer be ignored in learned systems. We are currently implementing these ideas in a new benchmarking suite.
You can read the paper here.