Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation
The computation of sliding window aggregates is one of the core functionalities of stream processing systems. Presently, there are two classes of approaches to evaluating them. The first is non-incremental, i.e., every window is evaluated in isolation even if overlapping windows provide opportunities for work-sharing. While not algorithmically efficient, this class of algorithm is usually very CPU efficient. The other approach is incremental: to the amount possible, the result of one window evaluation is used to help with the evaluation of the next window. While algorithmically efficient, the inherent control-dependencies in the CPU instruction stream make this highly CPU inefficient. In this paper, we analyse the state of the art in efficient incremental window processing and extend the fastest known algorithm, the Two-Stacks approach with known as well as novel optimisations. These include SIMD-parallel processing, internal data structure decomposition and data minimalism. We find that, thus optimised, our incremental window aggregation algorithm outperforms the state-of-the-art incremental algorithm by up to 11×. In addition, it is at least competitive and often significantly (up to 80%) faster than a non-incremental algorithm. Consequently, stream processing systems can use our proposed algorithm to cover incremental as well as non-incremental aggregation resulting in systems that are simpler as well as faster.
9th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS)
Publication Year