Loading…
Computational Synteny Block: A Framework to Identify Evolutionary Events
Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a col...
Saved in:
Published in: | IEEE transactions on nanobioscience 2016-06, Vol.15 (4), p.343-353 |
---|---|
Main Authors: | , |
Format: | Magazinearticle |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3 |
---|---|
cites | cdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3 |
container_end_page | 353 |
container_issue | 4 |
container_start_page | 343 |
container_title | IEEE transactions on nanobioscience |
container_volume | 15 |
creator | Arjona-Medina, Jose A. Trelles, Oswaldo |
description | Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data s |
doi_str_mv | 10.1109/TNB.2016.2554150 |
format | magazinearticle |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNB_2016_2554150</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7457273</ieee_id><sourcerecordid>1835580881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</originalsourceid><addsrcrecordid>eNqNkTtPwzAURi0E4lHYkZBQJBaWFF8_E7ZSUVqpgoEyR25yIwWSuNgJqP8elxYGJibb1-e7w3cIOQc6BKDpzeLxbsgoqCGTUoCke-QYpExipni6v7lzFQMTcEROvH-lFLSS6SE5YhpAaSGOyXRsm1Xfma6yramj53XbYbuO7mqbv91Go2jiTIOf1r1FnY1mBbZdVa6j-w9b998Rt3mEqT8lB6WpPZ7tzgF5mdwvxtN4_vQwG4_mcc5Z0sUslQqV1ksDhVFYQCpKIRGoSbQpc6VMwTgseSFUqUvKNaZUJ1yCLBDCjA_I9Xbvytn3Hn2XNZXPsa5Ni7b3GSQKVKgC0n-gPJRFkwT-gYLUFDiIgF79QV9t70J53xRwFcpVgaJbKnfWe4dltnJVE-rKgGYbd1lwl23cZTt3IXK5W9wvGyx-Az-yAnCxBSpE_P3WQmqmOf8CT7aaew</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>magazinearticle</recordtype><pqid>1811362716</pqid></control><display><type>magazinearticle</type><title>Computational Synteny Block: A Framework to Identify Evolutionary Events</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Arjona-Medina, Jose A. ; Trelles, Oswaldo</creator><creatorcontrib>Arjona-Medina, Jose A. ; Trelles, Oswaldo</creatorcontrib><description>Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data sets and Supplementary Material are available at: http://bitlab-es.com/gecko-csb/.</description><identifier>ISSN: 1536-1241</identifier><identifier>EISSN: 1558-2639</identifier><identifier>DOI: 10.1109/TNB.2016.2554150</identifier><identifier>PMID: 27116744</identifier><identifier>CODEN: ITMCEL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithm design and analysis ; Algorithms ; Bioinformatics ; Breakpoints ; Collection ; computational synteny blocks ; Computer programs ; duplications ; Dynamic programming ; Evolutionary ; Fragments ; Gene loci ; Genomes ; Genomics ; Heuristic algorithms ; Identification ; Nanobioscience ; repeats ; Reproduction ; Software ; synteny blocks ; tandem repeats ; Tasks</subject><ispartof>IEEE transactions on nanobioscience, 2016-06, Vol.15 (4), p.343-353</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</citedby><cites>FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</cites><orcidid>0000-0002-5033-4725</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7457273$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>780,784,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27116744$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Arjona-Medina, Jose A.</creatorcontrib><creatorcontrib>Trelles, Oswaldo</creatorcontrib><title>Computational Synteny Block: A Framework to Identify Evolutionary Events</title><title>IEEE transactions on nanobioscience</title><addtitle>TNB</addtitle><addtitle>IEEE Trans Nanobioscience</addtitle><description>Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data sets and Supplementary Material are available at: http://bitlab-es.com/gecko-csb/.</description><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Breakpoints</subject><subject>Collection</subject><subject>computational synteny blocks</subject><subject>Computer programs</subject><subject>duplications</subject><subject>Dynamic programming</subject><subject>Evolutionary</subject><subject>Fragments</subject><subject>Gene loci</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Heuristic algorithms</subject><subject>Identification</subject><subject>Nanobioscience</subject><subject>repeats</subject><subject>Reproduction</subject><subject>Software</subject><subject>synteny blocks</subject><subject>tandem repeats</subject><subject>Tasks</subject><issn>1536-1241</issn><issn>1558-2639</issn><fulltext>true</fulltext><rsrctype>magazinearticle</rsrctype><creationdate>2016</creationdate><recordtype>magazinearticle</recordtype><recordid>eNqNkTtPwzAURi0E4lHYkZBQJBaWFF8_E7ZSUVqpgoEyR25yIwWSuNgJqP8elxYGJibb1-e7w3cIOQc6BKDpzeLxbsgoqCGTUoCke-QYpExipni6v7lzFQMTcEROvH-lFLSS6SE5YhpAaSGOyXRsm1Xfma6yramj53XbYbuO7mqbv91Go2jiTIOf1r1FnY1mBbZdVa6j-w9b998Rt3mEqT8lB6WpPZ7tzgF5mdwvxtN4_vQwG4_mcc5Z0sUslQqV1ksDhVFYQCpKIRGoSbQpc6VMwTgseSFUqUvKNaZUJ1yCLBDCjA_I9Xbvytn3Hn2XNZXPsa5Ni7b3GSQKVKgC0n-gPJRFkwT-gYLUFDiIgF79QV9t70J53xRwFcpVgaJbKnfWe4dltnJVE-rKgGYbd1lwl23cZTt3IXK5W9wvGyx-Az-yAnCxBSpE_P3WQmqmOf8CT7aaew</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Arjona-Medina, Jose A.</creator><creator>Trelles, Oswaldo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5033-4725</orcidid></search><sort><creationdate>20160601</creationdate><title>Computational Synteny Block: A Framework to Identify Evolutionary Events</title><author>Arjona-Medina, Jose A. ; Trelles, Oswaldo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</frbrgroupid><rsrctype>magazinearticle</rsrctype><prefilter>magazinearticle</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Breakpoints</topic><topic>Collection</topic><topic>computational synteny blocks</topic><topic>Computer programs</topic><topic>duplications</topic><topic>Dynamic programming</topic><topic>Evolutionary</topic><topic>Fragments</topic><topic>Gene loci</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Heuristic algorithms</topic><topic>Identification</topic><topic>Nanobioscience</topic><topic>repeats</topic><topic>Reproduction</topic><topic>Software</topic><topic>synteny blocks</topic><topic>tandem repeats</topic><topic>Tasks</topic><toplevel>online_resources</toplevel><creatorcontrib>Arjona-Medina, Jose A.</creatorcontrib><creatorcontrib>Trelles, Oswaldo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on nanobioscience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Arjona-Medina, Jose A.</au><au>Trelles, Oswaldo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Computational Synteny Block: A Framework to Identify Evolutionary Events</atitle><jtitle>IEEE transactions on nanobioscience</jtitle><stitle>TNB</stitle><addtitle>IEEE Trans Nanobioscience</addtitle><date>2016-06-01</date><risdate>2016</risdate><volume>15</volume><issue>4</issue><spage>343</spage><epage>353</epage><pages>343-353</pages><issn>1536-1241</issn><eissn>1558-2639</eissn><coden>ITMCEL</coden><abstract>Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data sets and Supplementary Material are available at: http://bitlab-es.com/gecko-csb/.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>27116744</pmid><doi>10.1109/TNB.2016.2554150</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-5033-4725</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1536-1241 |
ispartof | IEEE transactions on nanobioscience, 2016-06, Vol.15 (4), p.343-353 |
issn | 1536-1241 1558-2639 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TNB_2016_2554150 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Algorithm design and analysis Algorithms Bioinformatics Breakpoints Collection computational synteny blocks Computer programs duplications Dynamic programming Evolutionary Fragments Gene loci Genomes Genomics Heuristic algorithms Identification Nanobioscience repeats Reproduction Software synteny blocks tandem repeats Tasks |
title | Computational Synteny Block: A Framework to Identify Evolutionary Events |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T00%3A12%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Computational%20Synteny%20Block:%20A%20Framework%20to%20Identify%20Evolutionary%20Events&rft.jtitle=IEEE%20transactions%20on%20nanobioscience&rft.au=Arjona-Medina,%20Jose%20A.&rft.date=2016-06-01&rft.volume=15&rft.issue=4&rft.spage=343&rft.epage=353&rft.pages=343-353&rft.issn=1536-1241&rft.eissn=1558-2639&rft.coden=ITMCEL&rft_id=info:doi/10.1109/TNB.2016.2554150&rft_dat=%3Cproquest_cross%3E1835580881%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1811362716&rft_id=info:pmid/27116744&rft_ieee_id=7457273&rfr_iscdi=true |