Seminars

Distributed Query Processing on Thousands of Cores

Claude Barthels, ETH Zurich

Abstract

Traditional database operators such as joins are relevant not only in the context of database engines but also as a building block in many computational and machine learning algorithms. In this talk, we explore the implementation of distributed join algorithms in systems with several thousand cores connected by a high-throughput, low-latency network. We compare radix hash join to sort-merge join algorithms, discuss their implementation at this scale, show the impact and advantages of RDMA, and discuss the importance of network scheduling. The experimental results show that the algorithms we present scale well with the number of cores. At the end of the talk, I will highlight future research directions (e.g., In-Network Processing) we are currently exploring at ETH Zurich.

About the speaker

Claude Barthels is a PhD candidate at ETH Zurich. His research interests encompass large-scale data management, distributed data processing, and high-performance networks. Claude is advised by Gustavo Alonso and Torsten Hoefler. Prior and during his doctoral studies, Claude was an intern at IBM Research and Oracle Labs working on database acceleration projects. His expected graduation date is end of 2018.

Date & Time

Thursday, March 1, 2018 - 10:00

Location

Huxley 217