Seminars

A combined batch-processing and data-streaming runtime in Apache Flink
Kostas Tzoumas and Stephan Ewen, Data Artisans
Abstract

This talk gives an overview of the technology inside Apache Flink. Apache is an open source project that develops a unified approach to batch- and streaming data analysis via distributed dataflows, rooted in the Stratosphere research project.

To the user, Flink offers rich APIs for batch- and streaming- data processing programs, with flexible windowing semantics, and seamless integration with Java and Scala programs. In addition, the Flink community is actively developing a series of libraries for the different use cases, such as Batch Machine Learning, Streaming Machine Learning, Graph Analysis, and high-level language queries.

The runtime of Apache Flink is a flexible stream processing system, which optimizations to efficiently handle batch programs (finite streams) and continuous streaming programs (infinite streams). Flink implements different recovery mechanisms (rollback/restart for finite streams, distributed snapshotting for infinite streams) and scheduling mechanisms, as well as a robust custom memory management to efficiently scale to data sets larger than main memory. Flink offers deep support for iterative programs via a restricted form of cyclic dataflows and can exploit stateful computation to support machine learning and graph analysis algorithms very efficiently.

About the speaker

Stephan Ewen is a committer at Apache Flink and co-founder and CTO of data Artisans, a Berlin-based company that is developing and contributing to Apache Flink. Before co-founding data Artisans, Stephan was leading the development of Flink since the early days of the project (then called Stratosphere). Stephan has a PhD in Computer Science from TU Berlin, and has been with IBM Research and Microsoft Research in the course of several internships.

Kostas Tzoumas is a committer at Apache Flink and co-founder and CEO of data Artisans, a Berlin-based company that is developing and contributing to Apache Flink. Before founding data Artisans, Kostas was a postdoctoral researcher at TU Berlin, received a PhD in Computer Science from Aalborg University and has been with the University of Maryland, College Park, and Microsoft Research in Redmond during the course of several internships.

Date & Time
Friday, May 8, 2015 - 11:00
Location
SALC 5