Loading…

A lightweight BLASTP and its implementation on CUDA GPUs

The BLAST server in the National Center for Biotechnology Information in the USA receives tens of thousands of queries per day on average. However, the service is always the same for every query even though query lengths vary significantly. In fact, the lengths of a large portion of protein sequence...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2021, Vol.77 (1), p.322-342
Main Authors: Huang, Liang-Tsung, Wei, Kai-Cheng, Wu, Chao-Chin, Chen, Chao-Yu, Wang, Jian-An
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c328t-b1f1b0dd739fceee6f4ec525b31b0a5edacb51bfbafbf97ee5260376a5a69a63
cites cdi_FETCH-LOGICAL-c328t-b1f1b0dd739fceee6f4ec525b31b0a5edacb51bfbafbf97ee5260376a5a69a63
container_end_page 342
container_issue 1
container_start_page 322
container_title The Journal of supercomputing
container_volume 77
creator Huang, Liang-Tsung
Wei, Kai-Cheng
Wu, Chao-Chin
Chen, Chao-Yu
Wang, Jian-An
description The BLAST server in the National Center for Biotechnology Information in the USA receives tens of thousands of queries per day on average. However, the service is always the same for every query even though query lengths vary significantly. In fact, the lengths of a large portion of protein sequences are less than 500. On the other hand, the hit detection process consumes the most of the execution time of BLAST and its core architecture is a lookup table. Following the above reasons, we propose a lightweight BLASTP for servicing not-too-long queries, where a hybrid query-index table is proposed accordingly. Each table entry consists of four bytes that can store up to three query positions. Therefore, a sequence word usually requires only one memory fetch to retrieve its hit information. Furthermore, additional dummy entries are embedded into the table and interleaved with original entries. The entries without any hits and dummy entries both can be used to buffer spilled query positions. The above features result in a much smaller lookup table with a higher utilization rate and a lower cache miss ratio. Experimental results show that the lightweight BLASTP outperforms CUDA-BLASTP with speedups ranging from 1.82 to 3.37 based on the first two critical phases.
doi_str_mv 10.1007/s11227-020-03267-1
format article
fullrecord <record><control><sourceid>crossref_sprin</sourceid><recordid>TN_cdi_crossref_primary_10_1007_s11227_020_03267_1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1007_s11227_020_03267_1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-b1f1b0dd739fceee6f4ec525b31b0a5edacb51bfbafbf97ee5260376a5a69a63</originalsourceid><addsrcrecordid>eNp9j1FLwzAQgIMoWKd_wKf8geglaZr2sU6dQsGB23NI2svs6LqRVMR_b-Z8Fo47uLvvuI-QWw53HEDfR86F0AwEMJCi0IyfkYwrLRnkZX5OMqjSqFS5uCRXMW4BIJdaZqSs6dBvPqYvPGb60NTvqyW1Y0f7KdJ-dxhwh-Nkp34_0hTz9WNNF8t1vCYX3g4Rb_7qjKyen1bzF9a8LV7ndcNaKcqJOe65g67TsvItIhY-x1YJ5WRqW4WdbZ3izjvrna80ohIFSF1YZYvKFnJGxOlsG_YxBvTmEPqdDd-Ggzmqm5O6SermV93wBMkTFNPyuMFgtvvPMKY3_6N-APIAW9g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A lightweight BLASTP and its implementation on CUDA GPUs</title><source>Springer Link</source><creator>Huang, Liang-Tsung ; Wei, Kai-Cheng ; Wu, Chao-Chin ; Chen, Chao-Yu ; Wang, Jian-An</creator><creatorcontrib>Huang, Liang-Tsung ; Wei, Kai-Cheng ; Wu, Chao-Chin ; Chen, Chao-Yu ; Wang, Jian-An</creatorcontrib><description>The BLAST server in the National Center for Biotechnology Information in the USA receives tens of thousands of queries per day on average. However, the service is always the same for every query even though query lengths vary significantly. In fact, the lengths of a large portion of protein sequences are less than 500. On the other hand, the hit detection process consumes the most of the execution time of BLAST and its core architecture is a lookup table. Following the above reasons, we propose a lightweight BLASTP for servicing not-too-long queries, where a hybrid query-index table is proposed accordingly. Each table entry consists of four bytes that can store up to three query positions. Therefore, a sequence word usually requires only one memory fetch to retrieve its hit information. Furthermore, additional dummy entries are embedded into the table and interleaved with original entries. The entries without any hits and dummy entries both can be used to buffer spilled query positions. The above features result in a much smaller lookup table with a higher utilization rate and a lower cache miss ratio. Experimental results show that the lightweight BLASTP outperforms CUDA-BLASTP with speedups ranging from 1.82 to 3.37 based on the first two critical phases.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-020-03267-1</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Compilers ; Computer Science ; Interpreters ; Processor Architectures ; Programming Languages</subject><ispartof>The Journal of supercomputing, 2021, Vol.77 (1), p.322-342</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-b1f1b0dd739fceee6f4ec525b31b0a5edacb51bfbafbf97ee5260376a5a69a63</citedby><cites>FETCH-LOGICAL-c328t-b1f1b0dd739fceee6f4ec525b31b0a5edacb51bfbafbf97ee5260376a5a69a63</cites><orcidid>0000-0002-0469-9707</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Huang, Liang-Tsung</creatorcontrib><creatorcontrib>Wei, Kai-Cheng</creatorcontrib><creatorcontrib>Wu, Chao-Chin</creatorcontrib><creatorcontrib>Chen, Chao-Yu</creatorcontrib><creatorcontrib>Wang, Jian-An</creatorcontrib><title>A lightweight BLASTP and its implementation on CUDA GPUs</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>The BLAST server in the National Center for Biotechnology Information in the USA receives tens of thousands of queries per day on average. However, the service is always the same for every query even though query lengths vary significantly. In fact, the lengths of a large portion of protein sequences are less than 500. On the other hand, the hit detection process consumes the most of the execution time of BLAST and its core architecture is a lookup table. Following the above reasons, we propose a lightweight BLASTP for servicing not-too-long queries, where a hybrid query-index table is proposed accordingly. Each table entry consists of four bytes that can store up to three query positions. Therefore, a sequence word usually requires only one memory fetch to retrieve its hit information. Furthermore, additional dummy entries are embedded into the table and interleaved with original entries. The entries without any hits and dummy entries both can be used to buffer spilled query positions. The above features result in a much smaller lookup table with a higher utilization rate and a lower cache miss ratio. Experimental results show that the lightweight BLASTP outperforms CUDA-BLASTP with speedups ranging from 1.82 to 3.37 based on the first two critical phases.</description><subject>Compilers</subject><subject>Computer Science</subject><subject>Interpreters</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9j1FLwzAQgIMoWKd_wKf8geglaZr2sU6dQsGB23NI2svs6LqRVMR_b-Z8Fo47uLvvuI-QWw53HEDfR86F0AwEMJCi0IyfkYwrLRnkZX5OMqjSqFS5uCRXMW4BIJdaZqSs6dBvPqYvPGb60NTvqyW1Y0f7KdJ-dxhwh-Nkp34_0hTz9WNNF8t1vCYX3g4Rb_7qjKyen1bzF9a8LV7ndcNaKcqJOe65g67TsvItIhY-x1YJ5WRqW4WdbZ3izjvrna80ohIFSF1YZYvKFnJGxOlsG_YxBvTmEPqdDd-Ggzmqm5O6SermV93wBMkTFNPyuMFgtvvPMKY3_6N-APIAW9g</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Huang, Liang-Tsung</creator><creator>Wei, Kai-Cheng</creator><creator>Wu, Chao-Chin</creator><creator>Chen, Chao-Yu</creator><creator>Wang, Jian-An</creator><general>Springer US</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-0469-9707</orcidid></search><sort><creationdate>2021</creationdate><title>A lightweight BLASTP and its implementation on CUDA GPUs</title><author>Huang, Liang-Tsung ; Wei, Kai-Cheng ; Wu, Chao-Chin ; Chen, Chao-Yu ; Wang, Jian-An</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-b1f1b0dd739fceee6f4ec525b31b0a5edacb51bfbafbf97ee5260376a5a69a63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Compilers</topic><topic>Computer Science</topic><topic>Interpreters</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Liang-Tsung</creatorcontrib><creatorcontrib>Wei, Kai-Cheng</creatorcontrib><creatorcontrib>Wu, Chao-Chin</creatorcontrib><creatorcontrib>Chen, Chao-Yu</creatorcontrib><creatorcontrib>Wang, Jian-An</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Liang-Tsung</au><au>Wei, Kai-Cheng</au><au>Wu, Chao-Chin</au><au>Chen, Chao-Yu</au><au>Wang, Jian-An</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A lightweight BLASTP and its implementation on CUDA GPUs</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2021</date><risdate>2021</risdate><volume>77</volume><issue>1</issue><spage>322</spage><epage>342</epage><pages>322-342</pages><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>The BLAST server in the National Center for Biotechnology Information in the USA receives tens of thousands of queries per day on average. However, the service is always the same for every query even though query lengths vary significantly. In fact, the lengths of a large portion of protein sequences are less than 500. On the other hand, the hit detection process consumes the most of the execution time of BLAST and its core architecture is a lookup table. Following the above reasons, we propose a lightweight BLASTP for servicing not-too-long queries, where a hybrid query-index table is proposed accordingly. Each table entry consists of four bytes that can store up to three query positions. Therefore, a sequence word usually requires only one memory fetch to retrieve its hit information. Furthermore, additional dummy entries are embedded into the table and interleaved with original entries. The entries without any hits and dummy entries both can be used to buffer spilled query positions. The above features result in a much smaller lookup table with a higher utilization rate and a lower cache miss ratio. Experimental results show that the lightweight BLASTP outperforms CUDA-BLASTP with speedups ranging from 1.82 to 3.37 based on the first two critical phases.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-020-03267-1</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-0469-9707</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0920-8542
ispartof The Journal of supercomputing, 2021, Vol.77 (1), p.322-342
issn 0920-8542
1573-0484
language eng
recordid cdi_crossref_primary_10_1007_s11227_020_03267_1
source Springer Link
subjects Compilers
Computer Science
Interpreters
Processor Architectures
Programming Languages
title A lightweight BLASTP and its implementation on CUDA GPUs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T05%3A20%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20lightweight%20BLASTP%20and%20its%20implementation%20on%20CUDA%20GPUs&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Huang,%20Liang-Tsung&rft.date=2021&rft.volume=77&rft.issue=1&rft.spage=322&rft.epage=342&rft.pages=322-342&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-020-03267-1&rft_dat=%3Ccrossref_sprin%3E10_1007_s11227_020_03267_1%3C/crossref_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c328t-b1f1b0dd739fceee6f4ec525b31b0a5edacb51bfbafbf97ee5260376a5a69a63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true