Joana M. F. da Trindade, MIT
Abstract
Graphs are an increasingly popular way to model real-world entities and
relationships between them. At Microsoft, amongst many other use cases, we
use graphs to record dependencies between users, jobs, files, and tasks in
our big-data infrastructure. These graphs, which grow by TBs/day, are used
for auditing, service analytics, and to power advanced system optimizations.
In this paper, we focus on analytics over such dependency graphs. Our key
observation is that these real-world graphs are often substantially more
structured than a generic node-and-edge model would suggest and that this
structure can be leveraged to significantly optimize querying performance.
This insight allows us to leverage structural properties of graphs and queries
to automatically derive incrementally-maintainable materialized graph views
that dramatically speed up many queries. Such materialization techniques are
compatible with most graph processing systems. We show on a range of queries
on real graphs that these techniques substantially reduce the effective graph
size and yield significant performance speed-ups (up to 70x), in some cases
making otherwise intractable queries possible.
About the speaker
Joana is a second year PhD student in EECS at MIT, working with Prof. Sam
Madden at MIT's Database group. Her research interests revolve around performance
aspects of distributed storage and data processing systems, primarily focusing
on query optimization techniques for large-scale graph data management. Prior
to MIT she was a software engineer at Google, spending most of her time there at
a Storage Infrastructure team whose goal was to radically improve performance
of some of Google's proprietary distributed storage systems, including Bigtable,
Colossus, and Blobstore. She also contributed to metadata storage and data
infrastructure tasks for Google Drive and Google's Jamboard. In a past life she
obtained an M.S. in Computer Science from University of Illinois at
Urbana-Champaign, and a B.S. in Computer Science from UFRGS, Brazil.