Loading…
Optimizing Data Placement of MapReduce on Ceph-Based Framework under Load-Balancing Constraint
Ceph has been widely used as a distributed object store and file system due to its high availability, reliability and scalability. Strategies of data placements in Ceph composed of heterogeneous clusters can greatly affect the system performance and load balancing. For a given application, it is cri...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Ceph has been widely used as a distributed object store and file system due to its high availability, reliability and scalability. Strategies of data placements in Ceph composed of heterogeneous clusters can greatly affect the system performance and load balancing. For a given application, it is critical to find the optimal data placement in Ceph, such that the completion time of the application can be minimized under the load-balancing constraint. This paper presents a novel Ceph-based framework that integrally considers the load balancing and the heterogeneities, including the computational capacity and the network bandwidth. The presented framework is suitable for the applications based on the principle of moving computation rather than data across clusters, such as MapReduce. According to the Ceph-based framework and the properties of MapReduce, we formulate the Mixed Integer Linear Programming (MILP) to obtain the optimal data placement. However, because of the large computational complexity of MILP, we devise an efficient algorithm to obtain the near-optimal solutions. The experimental results show that the proposed algorithm can achieve up to 25.6% improvement on system performance, compared with the original strategy implemented in Ceph. |
---|---|
ISSN: | 1521-9097 2690-5965 |
DOI: | 10.1109/ICPADS.2016.0083 |