MDF: Meta-Dataflows

One well publicized promise of the information age is expanding our ability to develop richer analysis of big data in search of underlying information. To this end there has been a focus on the ability to handle larger amounts of data in hopes that processing more data provides more information. While this has produced some amazing tools, it ignores one of the main dimensions of creating in depth analysis – the variability of the analysis function. With Meta Data Flows (MDFs) we set about creating a tool for users to specify a solution space of many ways to analyse a set of data in different ways and compare the outputs of those analyses against each other. MDFs automatically explore this solution space and find the user's most desired outcome(s) without the need to change the code or deployment configuration.

Exploring the solution space requires duplicating data that is altered in different ways and lots of computation. As such it is important to execute an efficient exploration strategy, which is what our MDF expansion on the SEEP framework provides.

Matthias Weidlich (Humboldt University, Germany)
Pijika Watcharapichat (Microsoft Research Cambridge)
Victoria Lopez Morales (Sainsbury's, UK)
William Culhane (Google, US)

Related Publications

Raul Castro Fernandez, William Culhane, Pijika Watcharapichat, Matthias Weidlich, Victoria Lopez Morales, and Peter Pietzuch
ACM Conference on Management of Data (SIGMOD), 2018
Houston, TX, US