This talk gives an overview of the technology inside Apache Flink. Apache is an open source project that develops a unified approach to batch- and streaming data analysis via distributed dataflows, rooted in the Stratosphere research project.
To the user, Flink offers rich APIs for batch- and streaming- data processing programs, with flexible windowing semantics, and seamless integration with Java and Scala programs. In addition, the Flink community is actively developing a series of libraries for the different use cases, such as Batch Machine Learning, Streaming Machine Learning, Graph Analysis, and high-level language queries.
The runtime of Apache Flink is a flexible stream processing system, which optimizations to efficiently handle batch programs (finite streams) and continuous streaming programs (infinite streams). Flink implements different recovery mechanisms (rollback/restart for finite streams, distributed snapshotting for infinite streams) and scheduling mechanisms, as well as a robust custom memory management to efficiently scale to data sets larger than main memory. Flink offers deep support for iterative programs via a restricted form of cyclic dataflows and can exploit stateful computation to support machine learning and graph analysis algorithms very efficiently.
Stephan Ewen is a committer at Apache Flink and co-founder and CTO of data Artisans, a Berlin-based company that is developing and contributing to Apache Flink. Before co-founding data Artisans, Stephan was leading the development of Flink since the early days of the project (then called Stratosphere). Stephan has a PhD in Computer Science from TU Berlin, and has been with IBM Research and Microsoft Research in the course of several internships.
Kostas Tzoumas is a committer at Apache Flink and co-founder and CEO of data Artisans, a Berlin-based company that is developing and contributing to Apache Flink. Before founding data Artisans, Kostas was a postdoctoral researcher at TU Berlin, received a PhD in Computer Science from Aalborg University and has been with the University of Maryland, College Park, and Microsoft Research in Redmond during the course of several internships.