Distributed Query Processing on Thousands of Cores
Claude Barthels, ETH Zurich
Abstract
Traditional database operators such as joins are relevant not only in the
context of database engines but also as a building block in many computational and
machine learning algorithms. In this talk, we explore the implementation of distributed
join algorithms in systems with several thousand cores connected by a high-throughput,
low-latency network. We compare radix hash join to sort-merge join algorithms, discuss
their implementation at this scale, show the impact and advantages of RDMA, and
discuss the importance of network scheduling. The experimental results show that the
algorithms we present scale well with the number of cores. At the end of the talk, I will
highlight future research directions (e.g., In-Network Processing) we are currently exploring
at ETH Zurich.
About the speaker
Claude Barthels is a PhD candidate at ETH Zurich. His research interests encompass
large-scale data management, distributed data processing, and high-performance networks.
Claude is advised by Gustavo Alonso and Torsten Hoefler. Prior and during his doctoral studies,
Claude was an intern at IBM Research and Oracle Labs working on database acceleration
projects. His expected graduation date is end of 2018.