Cluster management at Google with Borg
John Wilkes, Principal Software Engineer, Google
Google’s Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines. It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior. This is a longer version of the EuroSys paper talk on Borg. It'll include a quick summary of the Borg system architecture and features, provide a quantitative analysis of some of its policy decisions, and then explain how Borg has influenced the open source Kubernetes system.
About the speaker
John Wilkes has been at Google since 2008, where he is working on cluster management and infrastructure services. Before that, he spent a long time at HP Labs, becoming an HP and ACM Fellow in 2002. He is interested in far too many aspects of distributed systems, but a recurring theme has been technologies that allow systems to manage themselves. In his spare time he continues, stubbornly, trying to learn how to blow glass.
Date & Time
Thursday, November 19, 2015 - 14:00
Huxley 218