Scaling Storage Engines | LSDS - Large-Scale Data & Systems Group, Imperial College London

Our society is creating and storing exponentially increasing amounts of data. While this may seem apparent, what is often less thought of are the storage engines that allow us to store and access this data; these engines are necessary for maintaining vast amounts of information and facilitating the access that extracts knowledge from bits and bytes. In the late-2000s, a new class of "write-optimized" storage engines emerged. These engines prioritize the efficiency of ingesting new data, and they have become widely used, including in Google and Amazon’s massive cloud databases. However, these storage engines are complex: they encompass many design and tuning decisions whose impact on performance is hard to predict. More troublingly, their performance deteriorates with respect to the amount of data they contain. These problems, in turn, prevent applications running on top of them from scaling (e.g., across larger business markets or scientific experiments), or they force practitioners to invest more heavily in hardware and energy to keep performance steady.

In this talk, I will describe ongoing work on how to allow such storage engines to continue functioning efficiently as data size increases. The goal is to allow the numerous applications utilizing these storage engines to continue exacting value and knowledge from Big Data through the use of more efficient algorithms as opposed to over-provisioned hardware and energy.

Please email for a Zoom link

Niv Dayan is a researcher at Pliops, a storage acceleration startup in Tel Aviv. Before that, he was a postdoc at the Data Systems Lab at Harvard. Niv works at the intersection of systems and theory for designing efficient data storage.

Seminars