Tesseract is a distributed system designed for graph pattern mining on evolving datasets. The system addresses computational challenges in identifying matches within graph-structured data used across social networks, chemistry, fraud detection, and semantic web applications.
Key Technical Contributions
The system implements a novel change detection algorithm that efficiently determines the exact modifications for each update. Rather than recomputing results from scratch after graph changes, Tesseract decomposes update streams into individual mining tasks distributed across workers.
Performance
The implementation handles millions of updates per second with low latency and demonstrates significantly faster processing than periodic snapshot-based recomputation. Notably, it outperforms static mining approaches despite supporting real-time graph modifications, attributed to efficient memory management and reduced communication overhead.
Architecture
Tesseract employs task parallelism over data parallelism, enabling dynamic task assignment while maintaining balanced worker distribution and minimized cross-worker synchronization at scale.
