I started work in systems for machine learning around a year ago, joining the AI Frameworks team at Microsoft, and working from inference on small client devices up to model training over distributed GPU clusters.
I am going to talk about some of the things that surprised me. Central to this is the difficulty in identifying abstractions. We have techniques such as pipeline parallelism and tensor parallelism, but the implementations are tightly coupled with the models themselves, and optimized alongside them. It is tempting to design fresh alternatives that can decouple distribution techniques from models; however, deploying these into an evolving ecosystem is itself a challenge.
I will wrap up by describing some of our recent work to chart an incremental path through these topics.
Please email for a
Zoom link