Loading…
Practicable live container migrations in high performance computing clouds: Diskless, iterative, and connection-persistent
Checkpoint/Restore techniques had been thoroughly used by the High Performance Computing (HPC) community in the context of failure recovery. Given the current trend in HPC to use containerization to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads, as well...
Saved in:
Published in: | Journal of systems architecture 2024-07, Vol.152, p.103157, Article 103157 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Checkpoint/Restore techniques had been thoroughly used by the High Performance Computing (HPC) community in the context of failure recovery. Given the current trend in HPC to use containerization to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads, as well as efficient and reliable sharing and management of HPC Cloud infrastructures, there is a need to integrate Checkpoint/Restore with containerization in such a way that the freeze time of the application is minimal and live migrations are practicable. Whereas current Checkpoint/Restore tools (such as CRIU) support several options to accomplish this, most of them are rarely exploited in HPC Clouds and, consequently, their potential impact on the performance is barely known. Therefore, this paper explores the use of CRIU’s advanced features to implement diskless, iterative (pre-copy and post-copy) migrations of containers with external network namespaces and established TCP connections, so that memory-intensive and connection-persistent HPC applications can live-migrate. Our extensive experiments to characterize the performance impact of those features demonstrate that properly-configured live migrations incur low application downtime and memory/disk usage and are indeed feasible in containerized HPC Clouds.
•We carry out fully-featured seamless live migrations of runC containers.•We minimize the downtime and disk utilization through diskless, iterative migrations.•We migrate containers with established TCP connections transparently to the clients.•We characterize the facets of live migrations in representative HPC benchmarks. |
---|---|
ISSN: | 1383-7621 1873-6165 |
DOI: | 10.1016/j.sysarc.2024.103157 |