Seminars

Asynchronous Prefix Recoverability for Fast Distributed Stores

Tianyu Li, Massachusetts Institute of Technology

Abstract

Accessing and updating data sharded across distributed machines safely and speedily in the face of failures remains a challenging problem. Most prominently, applications that share state across different nodes want their writes to quickly become visible to others, without giving up recoverability guarantees in case a failure occurs. Current solutions of a fast cache backed by storage cannot support this use case easily. In this work, we design a distributed protocol, called Distributed Prefix Recovery (DPR) that builds on top of a sharded cache-store architecture with single-key operations, to provide cross-shard recoverability guarantees. With DPR, many clients can read and update shared state at sub-millisecond latency, while receiving periodic prefix durability guarantees. On failure, DPR quickly restores the system to a prefix-consistent state with a novel non-blocking rollback scheme. In this talk, I will discuss the details of the DPR algorithm and briefly cover our ongoing work to make DPR more usable for a broader audience

Please email for a Zoom link

About the speaker

Tianyu is a third-year PhD student at MIT, advised by Sam Madden. His research focuses on developing new fault-tolerant schemes optimized for the modern cloud workload. Such schemes must have low-overhead to support decomposing applications into fine-grained execution slices (e.g., on a serverless worker) without compromising guarantees or performance in the common case. Before joining MIT, he obtained a BS and MS from CMU, advised by Andy Pavlo.

Date & Time

Thursday, January 13, 2022 - 14:00

Location

Online