Seminars

Building Better Datacenters - The Quest for Low Latency

Simon Peter, University of Texas at Austin

Abstract

As datacenter applications grow in number and complexity, datacenter-internal service latency requirements are dropping into the microsecond range. Providing consistent microsecond-scale service latencies at increasing datacenter utilization is difficult, especially at scale, where failures are common. Operating system functionality on the service critical path often incurs high, millisecond-scale overhead, and introduces even longer queueing delay as utilization increases and during fail-over. My research aims to dramatically lower service latencies under rising utilization by co-designing hardware and operating system functionality to remove these overheads from the critical path, even when failures are common.

My recent focus has been on building low latency and available storage systems. The adoption of low latency persistent memory modules (PMMs) in datacenter servers upends the long-established model of remote storage for distributed file systems. Instead, by colocating computation with PMM storage we can provide applications with much lower IO and application failover latencies, while offering strong consistency. I present Assise, a new distributed file system, based on a persistent, replicated coherence protocol that manages client-local PMM as a linearizable and crash-recoverable cache between applications and slower (and possibly remote) storage. Assise maximizes locality for all file IO by carrying out IO on process-local, socket-local, and client-local PMM whenever possible. Assise minimizes coherence overhead by maintaining consistency at IO operation granularity, rather than at fixed block sizes. Assise improves IO latency, throughput, and fail-over time by an order of magnitude versus the state-of-the-art, while providing stronger consistency semantics. I finish with an overview of further research in this space, and an outlook to impending energy constraints of large scale systems, leading to a future research agenda in energy-resilient system design.

Please email for a Zoom link

About the speaker

Simon is an assistant professor in computer science at The University of Texas at Austin. Simon works to dramatically improve data center efficiency and reliability by designing, building, and evaluating new alternatives for their hardware and software components. Simon currently co-designs networking and storage stacks with new hardware technologies to reduce service latencies by orders of magnitude beyond today's capabilities.

Simon is the director of the Texas Systems Research Consortium, where he collaborates closely with industry to shape the future of cloud computing. Simon's work is supported by VMware, Microsoft Research, Huawei, Google, Citadel Securities, and Arm. Simon received the SIGOPS Hall of Fame award in 2020. He was twice awarded the Jay Lepreau Best Paper Award, in 2014 and 2016, an IEEE Micro Top Pick Honorable Mention in 2021, and a Memorable Paper Award in 2018. He received an NSF CAREER Award and he is a Sloan research fellow. Before joining UT Austin in 2016, Simon was a research associate at the University of Washington from 2012-2016. He received a Ph.D. in Computer Science from ETH Zurich in 2012.

Date & Time

Thursday, May 13, 2021 - 14:00

Location

Online