Loading…

Parallel computations of local PageRank problem based on Graphics Processing Unit

Summary In this paper, we study the issue of improving the performance of Markov chain Monte Carlo method to solve local PageRank problem under General Purpose Graphics Processing Unit environment. As a large number of dangling vertices cause large storage space of dangling vertices and thus slow do...

Full description

Saved in:

Bibliographic Details
Published in:	Concurrency and computation 2017-12, Vol.29 (24), p.n/a
Main Authors:	Lai, Siyan, Shao, Bo, Xu, Ying, Lin, Xiaola
Format:	Article
Language:	English
Subjects:	Algorithms GPU Graphics processing units local PageRank low‐discrepancy sequences Markov analysis Markov chains Monte Carlo method Monte Carlo simulation Optimization Parallel processing Random walk Search engines Sequences shared memory Strategy Warp
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Summary In this paper, we study the issue of improving the performance of Markov chain Monte Carlo method to solve local PageRank problem under General Purpose Graphics Processing Unit environment. As a large number of dangling vertices cause large storage space of dangling vertices and thus slow down the Markov chain procession, we propose a reordering strategy to compress the storage space and reduce the computational complexity of Markov chain procession. In our performance study, by parallelizing and optimizing the proposed algorithm based on GPU, the reordering strategy can be up 12× faster compared with basic method, where the graphs have high‐proportion dangling vertices. According to our investigation on this issue, the variance of random walks determines the number of random walks in the computation; we thus introduce low‐discrepancy sequences to enhance the performance. Moreover, the low‐discrepancy sequences are organized to load in the on‐chip shared memory to accelerate fetching with a wise warp scheduling for bank conflict schema. A series of experiments have been conducted to evaluate the optimization efficiency. Compared with fetching data from off‐chip global memory, the shared‐memory‐based strategy can have over 10× speedup ratio performance. The experiments indicate that the size of shared memory has a significant impact on the parallelism of the proposed method as well.
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.4245