Loading…

GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit

We propose GraphD, an out-of-core Pregel-like system targeting efficient big graph processing with a small cluster of commodity PCs connected by Gigabit Ethernet, an environment affordable to most users. This is in contrast to some recent efforts for out-of-core graph computation with specialized ha...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems 2018-01, Vol.29 (1), p.99-114
Main Authors: Da Yan, Yuzhen Huang, Miao Liu, Hongzhi Chen, Cheng, James, Huanhuan Wu, Chengcui Zhang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose GraphD, an out-of-core Pregel-like system targeting efficient big graph processing with a small cluster of commodity PCs connected by Gigabit Ethernet, an environment affordable to most users. This is in contrast to some recent efforts for out-of-core graph computation with specialized hardware. In our setting, a vertex-centric program is often data-intensive, since the CPU cost of calculating a message value is negligible compared with the network cost of transmitting that message. As a result, network bandwidth is usually the bottleneck, and out-of-core execution would not sacrifice performance if disk IO overhead can be hidden by message transmission, which is achieved by GraphD through the parallelism of computation and communication. GraphD streams edge and message data on local disks, and thus consumes negligible memory space. For a broad class of Pregel algorithms where message combiner is applicable, GraphD completely eliminates the need of any expensive external-memory join or group-by, which is required by existing systems such as Pregelix and Chaos. Extensive experiments show that GraphD beats existing out-of-core systems by orders of magnitude, and achieves comparable performance to in-memory systems running with adequate memory.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2017.2743708