Transparent Container Checkpointing and Rollback-Recovery with Kubernetes: Open Challenges and Research Directions
Radostin Stoyanov, University of Oxford
Abstract
In recent years, Kubernetes has been widely adopted as an orchestration platform for automating the deployment, scaling, and management for containerized applications at scale. The recently introduced native integration of container checkpointing in Kubernetes enables dynamic relocation, scaling-out, and load-balancing of microservices as well as fast startup times, forensic analysis, and fault-tolerance of stateful applications.
In this talk we are going to discuss some of the challenges associated with checkpointing long-running stateful applications and the performance trade-offs associated with periodic checkpointing and rollback recovery. The talk will also discuss how image-less checkpoint streaming approaches can be used to address some of these challenges, as well as future research directions.
About the speaker
Radostin Stoyanov is a DPhil student at the University of Oxford. His research focuses on improving the resilience and performance of HPC and cloud computing systems. Before joining Oxford, Radostin received his MPhil degree in Advanced Computer Science from University of Cambridge, and his MEng degree in Computing Science from University of Aberdeen. His master's research explored virtualization in programmable network devices and secure image-less container migration.