As datacenter applications grow in number and complexity,
datacenter-internal service latency requirements are dropping into the
microsecond range. Providing consistent microsecond-scale service
latencies at increasing datacenter utilization is difficult,
especially at scale, where failures are common. Operating system
functionality on the service critical path often incurs high,
millisecond-scale overhead, and introduces even longer queueing delay
as utilization increases and during fail-over. My research aims to
dramatically lower service latencies under rising utilization by
co-designing hardware and operating system functionality to remove
these overheads from the critical path, even when failures are
common.
My recent focus has been on building low latency and available storage
systems. The adoption of low latency persistent memory modules (PMMs)
in datacenter servers upends the long-established model of remote
storage for distributed file systems. Instead, by colocating
computation with PMM storage we can provide applications with much
lower IO and application failover latencies, while offering strong
consistency. I present Assise, a new distributed file system, based on
a persistent, replicated coherence protocol that manages client-local
PMM as a linearizable and crash-recoverable cache between applications
and slower (and possibly remote) storage. Assise maximizes locality
for all file IO by carrying out IO on process-local, socket-local, and
client-local PMM whenever possible. Assise minimizes coherence
overhead by maintaining consistency at IO operation granularity,
rather than at fixed block sizes. Assise improves IO latency,
throughput, and fail-over time by an order of magnitude versus the
state-of-the-art, while providing stronger consistency semantics. I
finish with an overview of further research in this space, and an
outlook to impending energy constraints of large scale systems,
leading to a future research agenda in energy-resilient system design.
Please email for a
Zoom link
Simon is an assistant professor in computer science at The University
of Texas at Austin. Simon works to dramatically improve data center
efficiency and reliability by designing, building, and evaluating new
alternatives for their hardware and software components. Simon
currently co-designs networking and storage stacks with new hardware
technologies to reduce service latencies by orders of magnitude beyond
today's capabilities.
Simon is the director of the Texas Systems Research Consortium, where
he collaborates closely with industry to shape the future of cloud
computing. Simon's work is supported by VMware, Microsoft Research,
Huawei, Google, Citadel Securities, and Arm. Simon received the SIGOPS
Hall of Fame award in 2020. He was twice awarded the Jay Lepreau Best
Paper Award, in 2014 and 2016, an IEEE Micro Top Pick Honorable
Mention in 2021, and a Memorable Paper Award in 2018. He received an
NSF CAREER Award and he is a Sloan research fellow. Before joining UT
Austin in 2016, Simon was a research associate at the University of
Washington from 2012-2016. He received a Ph.D. in Computer Science
from ETH Zurich in 2012.