Traditional database operators such as joins are relevant not only in the
context of database engines but also as a building block in many computational and
machine learning algorithms. In this talk, we explore the implementation of distributed
join algorithms in systems with several thousand cores connected by a high-throughput,
low-latency network. We compare radix hash join to sort-merge join algorithms, discuss
their implementation at this scale, show the impact and advantages of RDMA, and
discuss the importance of network scheduling. The experimental results show that the
algorithms we present scale well with the number of cores. At the end of the talk, I will
highlight future research directions (e.g., In-Network Processing) we are currently exploring
at ETH Zurich.
Who
Claude Barthels
Where
Huxley 217
Presenter Bio
Claude Barthels is a PhD candidate at ETH Zurich. His research interests encompass
large-scale data management, distributed data processing, and high-performance networks.
Claude is advised by Gustavo Alonso and Torsten Hoefler. Prior and during his doctoral studies,
Claude was an intern at IBM Research and Oracle Labs working on database acceleration
projects. His expected graduation date is end of 2018.
large-scale data management, distributed data processing, and high-performance networks.
Claude is advised by Gustavo Alonso and Torsten Hoefler. Prior and during his doctoral studies,
Claude was an intern at IBM Research and Oracle Labs working on database acceleration
projects. His expected graduation date is end of 2018.
Speaker affiliation
ETH Zurich