PresentationJune 23, 2018

Hurricane: Kicking Skew Out of Analytics

Hurricane: Kicking Skew Out of Analytics
Watch Presentation

Hurricane is a high-performance distributed analytics system designed to handle data skew effectively.

Key Innovation

Hurricane performs adaptive work partitioning based on load observed by nodes at runtime. When a node becomes overloaded, it can spawn task clones during execution, with each clone processing a subset of the data.

Core Features

  • Dynamic parallelism adjustment to gracefully manage skew
  • Decentralized data spreading across all nodes
  • Automatic load balancing to ensure fast completion times

How It Works

Rather than statically partitioning work upfront, Hurricane monitors runtime behavior and dynamically adjusts. Overloaded nodes create task clones that share the workload, while underutilized nodes can take on more work through work stealing.

This approach allows Hurricane to maintain optimal CPU and storage utilization even with highly skewed data distributions.

For comprehensive technical details, see our EuroSys 2018 paper.

Comments