I started work in systems for machine learning around a year ago, joining the AI Frameworks team at Microsoft, and working from inference on small client devices up to model training over distributed GPU clusters.
I am going to talk about some of the things that surprised me. Central to this is the difficulty in identifying abstractions. We have techniques such as pipeline parallelism and tensor parallelism, but the implementations are tightly coupled with the models themselves, and optimized alongside them. It is tempting to design fresh alternatives that can decouple distribution techniques from models; however, deploying these into an evolving ecosystem is itself a challenge.
I will wrap up by describing some of our recent work to chart an incremental path through these topics.
Please email for a
Zoom link
I am a Principal Architect at Microsoft, focused on PyTorch and the ONNX runtime. Prior to that I was with AWS and worked on large-scale storage performance and data analytics with Amazon S3. Further back, I led the Oracle Labs group in Cambridge, UK working on runtime systems for in-memory graph analytics, and the confluence of work on “big data” and ideas from high-performance computing. Before joining Oracle I was with Microsoft in a prior stint (2004–2012), and on the faculty of the University of Cambridge Computer Laboratory (2000–2004) during the early days of the Xen hypervisor project.