Over the past two decades, distributed stream processing engines (SPEs) have become a prominent component in the big data management tool-chain to support real-time, stateful data analytics applications on high-volume, high-velocity data streams in cloud deployments.
To this end, current SPEs continuously execute Map/Reduce-like pipelines on continuous data and apply data-centric parallelism to scale-out on a cluster of servers.
Current SPEs assume so-called commodity hardware as they follow Map/Reduce-like paradigms based on the shared-nothing architecture.
Furthermore, they are agnostic to hardware configuration as they rely on managed runtimes, such as a Java Virtual Machine.
However, modern cloud computing infrastructures have improved dramatically in the past years and the common wisdom that they mainly provide commodity hardware no longer holds.
For instance, cloud platform vendors provide powerful compute and network capabilities as they offer servers with high-end CPUs with many cores and large caches as well as high-speed networks, such as Infiniband with Remote Direct Memory Access (RDMA) support.
Furthermore, modern cloud computing infrastructure is highly flexible, as it provides ad-hoc provisioning of resources, which enables scaling the compute and storage capabilities as well as coping with failures, while a deployed application is executed.
In this thesis, we show that current generation of SPEs have not evolved to leverage the above advancements of cloud computing platforms.
In fact, we experimentally demonstrate that they perform inefficiently, when deployed on platforms that feature HPC-grade CPUs, high-speed networks, as well as ad-hoc, flexible resource provisioning.
Overall, in this thesis, we present hardware-conscious solutions to efficiently execute stateful stream processing applications on this modern computing infrastructure.
Please email for a
Bonaventura Del Monte has been a PhD Student at DIMA since September 2016.
He has been conducting research on efficient stateful stream processing published at international venues, such as VLDB, ACM SIGMOD, and CIDR.
He is currently working on the topic of Stateful Stream Processing on high-speed networks and has further interests in modern hardware and efficient stateful query execution.