Loading…

How to partition a billion-node graph

Billion-node graphs pose significant challenges at all levels from storage infrastructures to programming models. It is critical to develop a general purpose platform for graph processing. A distributed memory system is considered a feasible platform supporting online query processing as well as off...

Full description

Saved in:
Bibliographic Details
Main Authors: Lu Wang, Yanghua Xiao, Bin Shao, Haixun Wang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Billion-node graphs pose significant challenges at all levels from storage infrastructures to programming models. It is critical to develop a general purpose platform for graph processing. A distributed memory system is considered a feasible platform supporting online query processing as well as offline graph analytics. In this paper, we study the problem of partitioning a billion-node graph on such a platform, an important consideration because it has direct impact on load balancing and communication overhead. It is challenging not just because the graph is large, but because we can no longer assume that the data can be organized in arbitrary ways to maximize the performance of the partitioning algorithm. Instead, the algorithm must adopt the same data and programming model adopted by the system and other applications. In this paper, we propose a multi-level label propagation (MLP) method for graph partitioning. Experimental results show that our solution can partition billion-node graphs within several hours on a distributed memory system consisting of merely several machines, and the quality of the partitions produced by our approach is comparable to state-of-the-art approaches applied on toy-size graphs.
ISSN:1063-6382
2375-026X
DOI:10.1109/ICDE.2014.6816682