Dr. Pramod Bhatotia, University of Edinburgh
Abstract
Parallel and distributed systems are a pervasive component of the modern computing environment. Today, large-scale geo-distributed data-centers coupled with the computing resources available at the edge and clients/IoT devices have become ubiquitous. This computing infrastructure consists of 100s of thousands of heterogeneous computing resources comprising general-purpose multicores, energy-efficient cores, specialized accelerators such as GPUs, FPGAs, etc. Such computing infrastructure powers not only some of the most popular consumer Internet services, scientific, and enterprise workloads, but also a growing number data-driven intelligent applications in the cyber-physical ecosystem. Due to the growing importance of these diverse applications, my research focuses on building software systems for this modern heterogenous computing infrastructure that provides fundamental trade-offs in performance, security, reliability, and (operational) cost.
In this talk, I will present three system design principles targeting modern hardware and applications: reliability, security, and scalability. More specifically, I will cover three example projects to showcase these design principles: (1) Reliability: How to leverage new ISA extensions to build reliable software systems; (2) Security: How to build secure systems for the underlying untrusted computing infrastructure using a combination of trusted execution environments (TEEs) and small trusted computing base (TCB); and (3) Scalability: How to seamlessly support ever growing application workload with increasing number of cores, and at the same time, embracing the heterogeneity in the underlying computing platform.
As I will show in the talk, we follow these design principles at all levels of the software stack covering operating system, storage/file-system, compiler and run-time libraries, and all the way to building distributed middleware. Our approach transparently supports existing applications -- we neither require a radical departure from the current models of programming nor complex, error-prone application-specific modifications.
About the speaker
Parallel and distributed systems are a pervasive component of the modern computing environment. Today, large-scale geo-distributed data-centers coupled with the computing resources available at the edge and clients/IoT devices have become ubiquitous. This computing infrastructure consists of 100s of thousands of heterogeneous computing resources comprising general-purpose multicores, energy-efficient cores, specialized accelerators such as GPUs, FPGAs, etc. Such computing infrastructure powers not only some of the most popular consumer Internet services, scientific, and enterprise workloads, but also a growing number data-driven intelligent applications in the cyber-physical ecosystem. Due to the growing importance of these diverse applications, my research focuses on building software systems for this modern heterogenous computing infrastructure that provides fundamental trade-offs in performance, security, reliability, and (operational) cost.
In this talk, I will present three system design principles targeting modern hardware and applications: reliability, security, and scalability. More specifically, I will cover three example projects to showcase these design principles: (1) Reliability: How to leverage new ISA extensions to build reliable software systems; (2) Security: How to build secure systems for the underlying untrusted computing infrastructure using a combination of trusted execution environments (TEEs) and small trusted computing base (TCB); and (3) Scalability: How to seamlessly support ever growing application workload with increasing number of cores, and at the same time, embracing the heterogeneity in the underlying computing platform.
As I will show in the talk, we follow these design principles at all levels of the software stack covering operating system, storage/file-system, compiler and run-time libraries, and all the way to building distributed middleware. Our approach transparently supports existing applications -- we neither require a radical departure from the current models of programming nor complex, error-prone application-specific modifications.