Seminars

Graph Views for Efficient Graph Analytics (in collaboration with Microsoft CISL)
Joana M. F. da Trindade, MIT
Abstract
Graphs are an increasingly popular way to model real-world entities and relationships between them. At Microsoft, amongst many other use cases, we use graphs to record dependencies between users, jobs, files, and tasks in our big-data infrastructure. These graphs, which grow by TBs/day, are used for auditing, service analytics, and to power advanced system optimizations. In this paper, we focus on analytics over such dependency graphs. Our key observation is that these real-world graphs are often substantially more structured than a generic node-and-edge model would suggest and that this structure can be leveraged to significantly optimize querying performance. This insight allows us to leverage structural properties of graphs and queries to automatically derive incrementally-maintainable materialized graph views that dramatically speed up many queries. Such materialization techniques are compatible with most graph processing systems. We show on a range of queries on real graphs that these techniques substantially reduce the effective graph size and yield significant performance speed-ups (up to 70x), in some cases making otherwise intractable queries possible.
About the speaker
Joana is a second year PhD student in EECS at MIT, working with Prof. Sam Madden at MIT's Database group. Her research interests revolve around performance aspects of distributed storage and data processing systems, primarily focusing on query optimization techniques for large-scale graph data management. Prior to MIT she was a software engineer at Google, spending most of her time there at a Storage Infrastructure team whose goal was to radically improve performance of some of Google's proprietary distributed storage systems, including Bigtable, Colossus, and Blobstore. She also contributed to metadata storage and data infrastructure tasks for Google Drive and Google's Jamboard. In a past life she obtained an M.S. in Computer Science from University of Illinois at Urbana-Champaign, and a B.S. in Computer Science from UFRGS, Brazil.
Date & Time
Saturday, April 21, 2018 - 11:00
Location
SAFB 164