Loading…

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analy...

Full description

Saved in:
Bibliographic Details
Main Authors: Zeni, Alberto, Guidi, Giulia, Ellis, Marquita, Ding, Nan, Santambrogio, Marco D., Hofmeyr, Steven, Buluc, Aydin, Oliker, Leonid, Yelick, Katherine
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c329t-283d58ab893e6b18c22a625ed5ef68164dbab9620f75088237ff4185374a0c453
cites
container_end_page 471
container_issue
container_start_page 462
container_title
container_volume 2020
creator Zeni, Alberto
Guidi, Giulia
Ellis, Marquita
Ding, Nan
Santambrogio, Marco D.
Hofmeyr, Steven
Buluc, Aydin
Oliker, Leonid
Yelick, Katherine
description Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our high-performance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6× and 30.7× using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3× LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6×. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near optimal on the NVIDIA Tesla V100s.
doi_str_mv 10.1109/IPDPS47924.2020.00055
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_osti_scitechconnect_1650093</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9139808</ieee_id><sourcerecordid>9139808</sourcerecordid><originalsourceid>FETCH-LOGICAL-c329t-283d58ab893e6b18c22a625ed5ef68164dbab9620f75088237ff4185374a0c453</originalsourceid><addsrcrecordid>eNpNjs1Kw0AYRUdRsNY-gQjB_cRv_mfETX-0LRQNasFdmEy-pJE2KUk2vr2BunB17-LcwyXkjkHMGLiHdbJIPqRxXMYcOMQAoNQZmThjmeGWaWs0nJMRUwIoB6Mu_vUrct113zDshHQj8rR5W05fH6NVVe5ogm3RtAdfB4yWyZbOfId59EUXbXOMNk1d0nf0eTTdV2V9wLq_IZeF33c4-csx2b48f85XdJCu59MNDYK7nnIrcmV9Zp1AnTEbOPeaK8wVFnq4K_PMZ05zKIwCa7kwRSGZVcJID0EqMSb3J2_T9VXaharHsAtNXWPoU6YVgBMDdHuCKkRMj2118O1P6phwFqz4BTIKU3U</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment</title><source>IEEE Xplore All Conference Series</source><creator>Zeni, Alberto ; Guidi, Giulia ; Ellis, Marquita ; Ding, Nan ; Santambrogio, Marco D. ; Hofmeyr, Steven ; Buluc, Aydin ; Oliker, Leonid ; Yelick, Katherine</creator><creatorcontrib>Zeni, Alberto ; Guidi, Giulia ; Ellis, Marquita ; Ding, Nan ; Santambrogio, Marco D. ; Hofmeyr, Steven ; Buluc, Aydin ; Oliker, Leonid ; Yelick, Katherine ; Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)</creatorcontrib><description>Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our high-performance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6× and 30.7× using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3× LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6×. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near optimal on the NVIDIA Tesla V100s.</description><identifier>ISSN: 1530-2075</identifier><identifier>EISSN: 1530-2075</identifier><identifier>EISBN: 9781728168760</identifier><identifier>EISBN: 1728168767</identifier><identifier>DOI: 10.1109/IPDPS47924.2020.00055</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Acceleration ; Computer architecture ; Graphics processing units ; Heuristic algorithms ; MATHEMATICS AND COMPUTING ; Parallel processing ; Software algorithms</subject><ispartof>2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020, Vol.2020, p.462-471</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c329t-283d58ab893e6b18c22a625ed5ef68164dbab9620f75088237ff4185374a0c453</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9139808$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,314,780,784,789,790,885,27924,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9139808$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.osti.gov/servlets/purl/1650093$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Zeni, Alberto</creatorcontrib><creatorcontrib>Guidi, Giulia</creatorcontrib><creatorcontrib>Ellis, Marquita</creatorcontrib><creatorcontrib>Ding, Nan</creatorcontrib><creatorcontrib>Santambrogio, Marco D.</creatorcontrib><creatorcontrib>Hofmeyr, Steven</creatorcontrib><creatorcontrib>Buluc, Aydin</creatorcontrib><creatorcontrib>Oliker, Leonid</creatorcontrib><creatorcontrib>Yelick, Katherine</creatorcontrib><creatorcontrib>Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)</creatorcontrib><title>LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment</title><title>2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</title><addtitle>IPDPS</addtitle><description>Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our high-performance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6× and 30.7× using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3× LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6×. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near optimal on the NVIDIA Tesla V100s.</description><subject>Acceleration</subject><subject>Computer architecture</subject><subject>Graphics processing units</subject><subject>Heuristic algorithms</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>Parallel processing</subject><subject>Software algorithms</subject><issn>1530-2075</issn><issn>1530-2075</issn><isbn>9781728168760</isbn><isbn>1728168767</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpNjs1Kw0AYRUdRsNY-gQjB_cRv_mfETX-0LRQNasFdmEy-pJE2KUk2vr2BunB17-LcwyXkjkHMGLiHdbJIPqRxXMYcOMQAoNQZmThjmeGWaWs0nJMRUwIoB6Mu_vUrct113zDshHQj8rR5W05fH6NVVe5ogm3RtAdfB4yWyZbOfId59EUXbXOMNk1d0nf0eTTdV2V9wLq_IZeF33c4-csx2b48f85XdJCu59MNDYK7nnIrcmV9Zp1AnTEbOPeaK8wVFnq4K_PMZ05zKIwCa7kwRSGZVcJID0EqMSb3J2_T9VXaharHsAtNXWPoU6YVgBMDdHuCKkRMj2118O1P6phwFqz4BTIKU3U</recordid><startdate>20200501</startdate><enddate>20200501</enddate><creator>Zeni, Alberto</creator><creator>Guidi, Giulia</creator><creator>Ellis, Marquita</creator><creator>Ding, Nan</creator><creator>Santambrogio, Marco D.</creator><creator>Hofmeyr, Steven</creator><creator>Buluc, Aydin</creator><creator>Oliker, Leonid</creator><creator>Yelick, Katherine</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>OIOZB</scope><scope>OTOTI</scope></search><sort><creationdate>20200501</creationdate><title>LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment</title><author>Zeni, Alberto ; Guidi, Giulia ; Ellis, Marquita ; Ding, Nan ; Santambrogio, Marco D. ; Hofmeyr, Steven ; Buluc, Aydin ; Oliker, Leonid ; Yelick, Katherine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c329t-283d58ab893e6b18c22a625ed5ef68164dbab9620f75088237ff4185374a0c453</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acceleration</topic><topic>Computer architecture</topic><topic>Graphics processing units</topic><topic>Heuristic algorithms</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>Parallel processing</topic><topic>Software algorithms</topic><toplevel>online_resources</toplevel><creatorcontrib>Zeni, Alberto</creatorcontrib><creatorcontrib>Guidi, Giulia</creatorcontrib><creatorcontrib>Ellis, Marquita</creatorcontrib><creatorcontrib>Ding, Nan</creatorcontrib><creatorcontrib>Santambrogio, Marco D.</creatorcontrib><creatorcontrib>Hofmeyr, Steven</creatorcontrib><creatorcontrib>Buluc, Aydin</creatorcontrib><creatorcontrib>Oliker, Leonid</creatorcontrib><creatorcontrib>Yelick, Katherine</creatorcontrib><creatorcontrib>Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zeni, Alberto</au><au>Guidi, Giulia</au><au>Ellis, Marquita</au><au>Ding, Nan</au><au>Santambrogio, Marco D.</au><au>Hofmeyr, Steven</au><au>Buluc, Aydin</au><au>Oliker, Leonid</au><au>Yelick, Katherine</au><aucorp>Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)</aucorp><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment</atitle><btitle>2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</btitle><stitle>IPDPS</stitle><date>2020-05-01</date><risdate>2020</risdate><volume>2020</volume><spage>462</spage><epage>471</epage><pages>462-471</pages><issn>1530-2075</issn><eissn>1530-2075</eissn><eisbn>9781728168760</eisbn><eisbn>1728168767</eisbn><abstract>Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our high-performance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6× and 30.7× using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3× LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6×. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near optimal on the NVIDIA Tesla V100s.</abstract><cop>United States</cop><pub>IEEE</pub><doi>10.1109/IPDPS47924.2020.00055</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1530-2075
ispartof 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020, Vol.2020, p.462-471
issn 1530-2075
1530-2075
language eng
recordid cdi_osti_scitechconnect_1650093
source IEEE Xplore All Conference Series
subjects Acceleration
Computer architecture
Graphics processing units
Heuristic algorithms
MATHEMATICS AND COMPUTING
Parallel processing
Software algorithms
title LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T15%3A24%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=LOGAN:%20High-Performance%20GPU-Based%20X-Drop%20Long-Read%20Alignment&rft.btitle=2020%20IEEE%20International%20Parallel%20and%20Distributed%20Processing%20Symposium%20(IPDPS)&rft.au=Zeni,%20Alberto&rft.aucorp=Lawrence%20Berkeley%20National%20Laboratory%20(LBNL),%20Berkeley,%20CA%20(United%20States)&rft.date=2020-05-01&rft.volume=2020&rft.spage=462&rft.epage=471&rft.pages=462-471&rft.issn=1530-2075&rft.eissn=1530-2075&rft_id=info:doi/10.1109/IPDPS47924.2020.00055&rft.eisbn=9781728168760&rft.eisbn_list=1728168767&rft_dat=%3Cieee_CHZPO%3E9139808%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c329t-283d58ab893e6b18c22a625ed5ef68164dbab9620f75088237ff4185374a0c453%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9139808&rfr_iscdi=true