Building modern dataflow systems
Dr. Frank McSherry, ETH Zürich
I'll talk through the design and implementation of "timely dataflow in Rust", an open-source project that extends and enriches the "timely dataflow" computational model first presented by the Naiad system, and the differential dataflow framework built on top of it. The project's goal is to provide an near-zero overhead framework for data-parallel dataflow computation, and to this end it simplifies and unifies several of Naiad's concepts through lossless abstractions that largely compile away. Our experience has been that timely dataflow programs give best-in-class performance, while still providing the experience of a medium-to-high level programming language. To support this, I'll walk through the example of differential dataflow, an incremental re-computation framework which seems to out-perform the current crop of specialized data processing systems, in part due to its ability to provide general computation abstractions that compile down to sequential scans over carefully managed resources. These projects reflect joint work with a great many people, including what was once the Naiad team at MSR-SV, the Systems Group at ETH Zürich, and many other collaborators.
About the speaker
Frank McSherry received his PhD from the University of Washington, working with Anna Karlin on spectral analysis of data. He then spent twelve years at Microsoft Research's Silicon Valley research center, working on topics ranging from differential privacy to data-parallel computation. He currently works at ETH Zürich's Systems Group on scalable stream processing and related topics.
Date & Time
Wednesday, July 11, 2018 - 11:00
Huxley 144