Loading…

Efficient algorithms for regular expression constrained sequence alignment

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs...

Full description

Saved in:
Bibliographic Details
Published in:Information processing letters 2007-09, Vol.103 (6), p.240-246
Main Authors: Chung, Yun-Sheng, Lu, Chin Lung, Tang, Chuan Yi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c354t-87a627d331385a1458f1144e8012e6a497d4cd54c33e2f0e0ca1aeac90c46d4d3
cites cdi_FETCH-LOGICAL-c354t-87a627d331385a1458f1144e8012e6a497d4cd54c33e2f0e0ca1aeac90c46d4d3
container_end_page 246
container_issue 6
container_start_page 240
container_title Information processing letters
container_volume 103
creator Chung, Yun-Sheng
Lu, Chin Lung
Tang, Chuan Yi
description Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O ( | Σ | 2 | V | 4 n 2 ) and O ( | Σ | 2 | V | 4 n ) , respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O ( | V | 3 n 2 ) time and O ( | V | 2 n ) space in the worst case. If | V | = O ( log n ) we propose another algorithm with time complexity O ( | V | 2 log | V | n 2 ) . The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of | Σ | 2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.
doi_str_mv 10.1016/j.ipl.2007.04.007
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_237284867</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S002001900700110X</els_id><sourcerecordid>1298995601</sourcerecordid><originalsourceid>FETCH-LOGICAL-c354t-87a627d331385a1458f1144e8012e6a497d4cd54c33e2f0e0ca1aeac90c46d4d3</originalsourceid><addsrcrecordid>eNp9kMtKxDAUhoMoOI4-gLsiuGw9uTRpcSWDVwbc6DqE5HRM6TRj0hF9ezPMgDtX_-a_nPMRckmhokDlTV_5zVAxAFWBqLIckRltFCslpe0xmQEwKIG2cErOUuoBQAquZuTlvuu89ThOhRlWIfrpY52KLsQi4mo7mFjg9yZiSj6MhQ1jmqLxI7oi4ecWR4s55lfjOheck5PODAkvDjon7w_3b4uncvn6-Ly4W5aW12IqG2UkU45zypvaUFE3HaVCYAOUoTSiVU5YVwvLObIOEKyhBo1twQrphONzcrXv3cSQb0iT7sM2jnlSM65YIxqpsonuTTaGlCJ2ehP92sQfTUHviOleZ2J6R0yD0Fly5vpQbJI1QxfNaH36CzYtSMXa7Lvd-zB_-eUx6rQjaNH5iHbSLvh_Vn4BnhOBEA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>237284867</pqid></control><display><type>article</type><title>Efficient algorithms for regular expression constrained sequence alignment</title><source>ScienceDirect Freedom Collection</source><source>Backfile Package - Computer Science (Legacy) [YCS]</source><source>Backfile Package - Mathematics (Legacy) [YMT]</source><creator>Chung, Yun-Sheng ; Lu, Chin Lung ; Tang, Chuan Yi</creator><creatorcontrib>Chung, Yun-Sheng ; Lu, Chin Lung ; Tang, Chuan Yi</creatorcontrib><description>Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O ( | Σ | 2 | V | 4 n 2 ) and O ( | Σ | 2 | V | 4 n ) , respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O ( | V | 3 n 2 ) time and O ( | V | 2 n ) space in the worst case. If | V | = O ( log n ) we propose another algorithm with time complexity O ( | V | 2 log | V | n 2 ) . The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of | Σ | 2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.</description><identifier>ISSN: 0020-0190</identifier><identifier>EISSN: 1872-6119</identifier><identifier>DOI: 10.1016/j.ipl.2007.04.007</identifier><identifier>CODEN: IFPLAT</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithmics. Computability. Computer arithmetics ; Algorithms ; Applied sciences ; Artificial intelligence ; Complexity theory ; Computer science; control theory; systems ; Constrained sequence alignment ; Dynamic programming ; Efficiency ; Exact sciences and technology ; Information systems. Data bases ; Learning and adaptive systems ; Memory organisation. Data processing ; Miscellaneous ; Regular expression ; Software ; Studies ; Theoretical computing</subject><ispartof>Information processing letters, 2007-09, Vol.103 (6), p.240-246</ispartof><rights>2007 Elsevier B.V.</rights><rights>2008 INIST-CNRS</rights><rights>Copyright Elsevier Sequoia S.A. Sep 15, 2007</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c354t-87a627d331385a1458f1144e8012e6a497d4cd54c33e2f0e0ca1aeac90c46d4d3</citedby><cites>FETCH-LOGICAL-c354t-87a627d331385a1458f1144e8012e6a497d4cd54c33e2f0e0ca1aeac90c46d4d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S002001900700110X$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3415,3550,27903,27904,45951,45981</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18906729$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Chung, Yun-Sheng</creatorcontrib><creatorcontrib>Lu, Chin Lung</creatorcontrib><creatorcontrib>Tang, Chuan Yi</creatorcontrib><title>Efficient algorithms for regular expression constrained sequence alignment</title><title>Information processing letters</title><description>Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O ( | Σ | 2 | V | 4 n 2 ) and O ( | Σ | 2 | V | 4 n ) , respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O ( | V | 3 n 2 ) time and O ( | V | 2 n ) space in the worst case. If | V | = O ( log n ) we propose another algorithm with time complexity O ( | V | 2 log | V | n 2 ) . The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of | Σ | 2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.</description><subject>Algorithmics. Computability. Computer arithmetics</subject><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Complexity theory</subject><subject>Computer science; control theory; systems</subject><subject>Constrained sequence alignment</subject><subject>Dynamic programming</subject><subject>Efficiency</subject><subject>Exact sciences and technology</subject><subject>Information systems. Data bases</subject><subject>Learning and adaptive systems</subject><subject>Memory organisation. Data processing</subject><subject>Miscellaneous</subject><subject>Regular expression</subject><subject>Software</subject><subject>Studies</subject><subject>Theoretical computing</subject><issn>0020-0190</issn><issn>1872-6119</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><recordid>eNp9kMtKxDAUhoMoOI4-gLsiuGw9uTRpcSWDVwbc6DqE5HRM6TRj0hF9ezPMgDtX_-a_nPMRckmhokDlTV_5zVAxAFWBqLIckRltFCslpe0xmQEwKIG2cErOUuoBQAquZuTlvuu89ThOhRlWIfrpY52KLsQi4mo7mFjg9yZiSj6MhQ1jmqLxI7oi4ecWR4s55lfjOheck5PODAkvDjon7w_3b4uncvn6-Ly4W5aW12IqG2UkU45zypvaUFE3HaVCYAOUoTSiVU5YVwvLObIOEKyhBo1twQrphONzcrXv3cSQb0iT7sM2jnlSM65YIxqpsonuTTaGlCJ2ehP92sQfTUHviOleZ2J6R0yD0Fly5vpQbJI1QxfNaH36CzYtSMXa7Lvd-zB_-eUx6rQjaNH5iHbSLvh_Vn4BnhOBEA</recordid><startdate>20070915</startdate><enddate>20070915</enddate><creator>Chung, Yun-Sheng</creator><creator>Lu, Chin Lung</creator><creator>Tang, Chuan Yi</creator><general>Elsevier B.V</general><general>Elsevier Science</general><general>Elsevier Sequoia S.A</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20070915</creationdate><title>Efficient algorithms for regular expression constrained sequence alignment</title><author>Chung, Yun-Sheng ; Lu, Chin Lung ; Tang, Chuan Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c354t-87a627d331385a1458f1144e8012e6a497d4cd54c33e2f0e0ca1aeac90c46d4d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Algorithmics. Computability. Computer arithmetics</topic><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Complexity theory</topic><topic>Computer science; control theory; systems</topic><topic>Constrained sequence alignment</topic><topic>Dynamic programming</topic><topic>Efficiency</topic><topic>Exact sciences and technology</topic><topic>Information systems. Data bases</topic><topic>Learning and adaptive systems</topic><topic>Memory organisation. Data processing</topic><topic>Miscellaneous</topic><topic>Regular expression</topic><topic>Software</topic><topic>Studies</topic><topic>Theoretical computing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chung, Yun-Sheng</creatorcontrib><creatorcontrib>Lu, Chin Lung</creatorcontrib><creatorcontrib>Tang, Chuan Yi</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Information processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chung, Yun-Sheng</au><au>Lu, Chin Lung</au><au>Tang, Chuan Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient algorithms for regular expression constrained sequence alignment</atitle><jtitle>Information processing letters</jtitle><date>2007-09-15</date><risdate>2007</risdate><volume>103</volume><issue>6</issue><spage>240</spage><epage>246</epage><pages>240-246</pages><issn>0020-0190</issn><eissn>1872-6119</eissn><coden>IFPLAT</coden><abstract>Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O ( | Σ | 2 | V | 4 n 2 ) and O ( | Σ | 2 | V | 4 n ) , respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O ( | V | 3 n 2 ) time and O ( | V | 2 n ) space in the worst case. If | V | = O ( log n ) we propose another algorithm with time complexity O ( | V | 2 log | V | n 2 ) . The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of | Σ | 2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.ipl.2007.04.007</doi><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0020-0190
ispartof Information processing letters, 2007-09, Vol.103 (6), p.240-246
issn 0020-0190
1872-6119
language eng
recordid cdi_proquest_journals_237284867
source ScienceDirect Freedom Collection; Backfile Package - Computer Science (Legacy) [YCS]; Backfile Package - Mathematics (Legacy) [YMT]
subjects Algorithmics. Computability. Computer arithmetics
Algorithms
Applied sciences
Artificial intelligence
Complexity theory
Computer science
control theory
systems
Constrained sequence alignment
Dynamic programming
Efficiency
Exact sciences and technology
Information systems. Data bases
Learning and adaptive systems
Memory organisation. Data processing
Miscellaneous
Regular expression
Software
Studies
Theoretical computing
title Efficient algorithms for regular expression constrained sequence alignment
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T18%3A04%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20algorithms%20for%20regular%20expression%20constrained%20sequence%20alignment&rft.jtitle=Information%20processing%20letters&rft.au=Chung,%20Yun-Sheng&rft.date=2007-09-15&rft.volume=103&rft.issue=6&rft.spage=240&rft.epage=246&rft.pages=240-246&rft.issn=0020-0190&rft.eissn=1872-6119&rft.coden=IFPLAT&rft_id=info:doi/10.1016/j.ipl.2007.04.007&rft_dat=%3Cproquest_cross%3E1298995601%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c354t-87a627d331385a1458f1144e8012e6a497d4cd54c33e2f0e0ca1aeac90c46d4d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=237284867&rft_id=info:pmid/&rfr_iscdi=true