Loading…

Optimizing Data Placement of MapReduce on Ceph-Based Framework under Load-Balancing Constraint

Ceph has been widely used as a distributed object store and file system due to its high availability, reliability and scalability. Strategies of data placements in Ceph composed of heterogeneous clusters can greatly affect the system performance and load balancing. For a given application, it is cri...

Full description

Saved in:
Bibliographic Details
Main Authors: Sha, Edwin H.-M, Yutong Liang, Weiwen Jiang, Xianzhang Chen, Qingfeng Zhuge
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Ceph has been widely used as a distributed object store and file system due to its high availability, reliability and scalability. Strategies of data placements in Ceph composed of heterogeneous clusters can greatly affect the system performance and load balancing. For a given application, it is critical to find the optimal data placement in Ceph, such that the completion time of the application can be minimized under the load-balancing constraint. This paper presents a novel Ceph-based framework that integrally considers the load balancing and the heterogeneities, including the computational capacity and the network bandwidth. The presented framework is suitable for the applications based on the principle of moving computation rather than data across clusters, such as MapReduce. According to the Ceph-based framework and the properties of MapReduce, we formulate the Mixed Integer Linear Programming (MILP) to obtain the optimal data placement. However, because of the large computational complexity of MILP, we devise an efficient algorithm to obtain the near-optimal solutions. The experimental results show that the proposed algorithm can achieve up to 25.6% improvement on system performance, compared with the original strategy implemented in Ceph.
ISSN:1521-9097
2690-5965
DOI:10.1109/ICPADS.2016.0083