Graphs are an increasingly popular way to model real-world entities and
relationships between them. At Microsoft, amongst many other use cases, we
use graphs to record dependencies between users, jobs, files, and tasks in
our big-data infrastructure. These graphs, which grow by TBs/day, are used
for auditing, service analytics, and to power advanced system optimizations.
In this paper, we focus on analytics over such dependency graphs. Our key
observation is that these real-world graphs are often substantially more
structured than a generic node-and-edge model would suggest and that this
structure can be leveraged to significantly optimize querying performance.
This insight allows us to leverage structural properties of graphs and queries
to automatically derive incrementally-maintainable materialized graph views
that dramatically speed up many queries. Such materialization techniques are
compatible with most graph processing systems. We show on a range of queries
on real graphs that these techniques substantially reduce the effective graph
size and yield significant performance speed-ups (up to 70x), in some cases
making otherwise intractable queries possible.