Loading…

Adaptive Remus: adaptive checkpointing for Xen-based virtual machine replication

With the ever increasing dependence on computers and networks, many systems are required to be continuously available in order to fulfil their mission. Virtualization technology enables high availability to be offered in a convenient, cost-effective manner: with the encapsulation provided by virtual...

Full description

Saved in:
Bibliographic Details
Published in:International journal of parallel, emergent and distributed systems emergent and distributed systems, 2017-07, Vol.32 (4), p.348-367
Main Authors: da Silva, Marcelo Pereira, Obelheiro, Rafael Rodrigues, Koslovski, Guilherme Piegas
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the ever increasing dependence on computers and networks, many systems are required to be continuously available in order to fulfil their mission. Virtualization technology enables high availability to be offered in a convenient, cost-effective manner: with the encapsulation provided by virtual machines (VMs), entire systems can be replicated transparently in software, obviating the need for expensive fault-tolerant hardware. Remus is a VM replication mechanism for the Xen hypervisor that provides high availability despite crash failures. Replication is performed by checkpointing the VM at fixed intervals. However, there is an antagonism between processing and communication regarding the optimal checkpoint interval: while longer intervals benefit processor-intensive applications, shorter intervals favour network-intensive applications. Thus, any chosen interval may not always be suitable for the hosted applications, limiting Remus usage in many scenarios. This work introduces Adaptive Remus, a proposal for adaptive checkpointing in Remus that dynamically adjusts the replication frequency according to the characteristics of running applications. Experimental results indicate that our proposal improves performance for applications that require both processing and communication, without harming applications that use only one type of resource. Adaptive Remus quantifies VM metrics to infer the current hosted application load. With this information, the mechanism adjusts the checkpointing frequency between two modes. (I) networking mode: increases the checkpointing frequency whenever output traffic is detected on the VM interface; and (II) processing mode: when there is no output traffic in the VM interface, the mechanism reduces the checkpointing frequency, increasing the VM execution time. This approach improves application performance by dynamically adapting the checkpoint interval.
ISSN:1744-5760
1744-5779
DOI:10.1080/17445760.2016.1162302