- Taming Skew in Large Scale Analytics
- Hurricane, big data, analytics, cluster computing, skew, high performance, task cloning, adaptive work partitioning, merging, repartitioning, load balancing, storage disaggregation, decentralized storage, bags, chunks, fine-grained partitioning, distributed scheduling, batch sampling, late binding.
Hurricane is a high-performance large-scale data analytics system that successfully tames skew in novel ways. Hurricane performs adaptive work partitioning based on load observed by nodes at runtime. Overloaded nodes can spawn clones of their tasks at any point during their execution, with each clone processing a subset of the original data.