Loading…

Fast parallel skew and prefix-doubling suffix array construction on the GPU

Summary Suffix arrays are fundamental full‐text index data structures of importance to a broad spectrum of applications in such fields as bioinformatics, Burrows–Wheeler transform‐based lossless data compression, and information retrieval. In this work, we propose and implement two massively paralle...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2016-08, Vol.28 (12), p.3466-3484
Main Authors: Wang, Leyuan, Baxter, Sean, Owens, John D.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary Suffix arrays are fundamental full‐text index data structures of importance to a broad spectrum of applications in such fields as bioinformatics, Burrows–Wheeler transform‐based lossless data compression, and information retrieval. In this work, we propose and implement two massively parallel approaches on the graphics processing unit (GPU) based on two classes of suffix array construction algorithms. The first, parallel skew, makes algorithmic improvements to the previous work of Deo and Keely to achieve a speedup of 1.45x over their work. The second, a hybrid skew and prefix‐doubling implementation, is the first of its kind on the GPU and achieves a speedup of 2.3–4.4x over Osipov's prefix‐doubling and 2.4–7.9x over our skew implementation on large datasets. Our implementations rely on two efficient parallel primitives, a merge and a segmented sort. We theoretically analyze the two formulations of suffix array construction algorithms and show performance comparisons on a large variety of practical inputs. We conclude that, with the novel use of our efficient segmented sort, prefix‐doubling is more competitive than skew on the GPU. We also demonstrate the effectiveness of our methods in our implementations of the Burrows‐Wheeler transform and in a parallel full‐text, minute‐space‐index for pattern searching. Copyright © 2016 John Wiley & Sons, Ltd.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.3867