LightSaber: Multi-core Window-Based Stream Processing

Window aggregation queries are a core part of streaming applications. To support window aggregation efficiently, stream processing engines face a trade-off between exploiting parallelism (at the instruction/multi-core levels) and incremental computation (across overlapping windows and queries). Existing engines implement ad-hoc aggregation and parallelization strategies. As a result, they only achieve high performance for specific queries depending on the window definition and the type of aggregation function.

We describe a general model for the design space of window aggregation strategies. Based on this, we introduce LightSaber, a new stream processing engine that balances parallelism and incremental processing when executing window aggregation queries on multi-core CPUs. Its design generalises existing approaches: (i) for parallel processing, LightSaber constructs a parallel aggregation tree (PAT) that exploits the parallelism of modern processors. The PAT divides window aggregation into intermediate steps that enable the efficient use of both instruction-level (i.e., SIMD) and task-level (i.e., multi-core) parallelism; and (ii) to generate efficient incremental code from the PAT, LightSaber uses a generalized aggregation graph (GAG), which encodes the low-level data dependencies required to produce aggregates over the stream. A GAG thus generalizes state-of-the-art approaches for incremental window aggregation and supports work-sharing between overlapping windows. LightSaber achieves up to an order of magnitude higher throughput compared to existing systems---on a 16-core server, it processes 470 million records/s with 150μs latency.

Implementation and source code

The LightSaber prototype implementation is available on GitHub.

Related Publications

Georgios Theodorakis, Fotios Kounelis, Peter Pietzuch, and Holger Pirk
48th International Conference on Very Large Data Bases (VLDB), 2022
Sydney, Australia
Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, and Holger Pirk
ACM International Conference on Management of Data (SIGMOD), 2020
Portland, OR, USA
Georgios Theodorakis, Peter Pietzuch, and Holger Pirk
23rd International Conference on Extending Database Technology (EDBT), 2020
Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, and Holger Pirk
9th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS), 2018
Rio de Janeiro, Brazil
Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Paolo Costa, Alexander L. Wolf, and Peter Pietzuch
ACM International Conference on Management of Data (SIGMOD), 2016
San Francisco, CA, USA