Loading…

Computational Synteny Block: A Framework to Identify Evolutionary Events

Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a col...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on nanobioscience 2016-06, Vol.15 (4), p.343-353
Main Authors: Arjona-Medina, Jose A., Trelles, Oswaldo
Format: Magazinearticle
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3
cites cdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3
container_end_page 353
container_issue 4
container_start_page 343
container_title IEEE transactions on nanobioscience
container_volume 15
creator Arjona-Medina, Jose A.
Trelles, Oswaldo
description Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data s
doi_str_mv 10.1109/TNB.2016.2554150
format magazinearticle
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNB_2016_2554150</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7457273</ieee_id><sourcerecordid>1835580881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</originalsourceid><addsrcrecordid>eNqNkTtPwzAURi0E4lHYkZBQJBaWFF8_E7ZSUVqpgoEyR25yIwWSuNgJqP8elxYGJibb1-e7w3cIOQc6BKDpzeLxbsgoqCGTUoCke-QYpExipni6v7lzFQMTcEROvH-lFLSS6SE5YhpAaSGOyXRsm1Xfma6yramj53XbYbuO7mqbv91Go2jiTIOf1r1FnY1mBbZdVa6j-w9b998Rt3mEqT8lB6WpPZ7tzgF5mdwvxtN4_vQwG4_mcc5Z0sUslQqV1ksDhVFYQCpKIRGoSbQpc6VMwTgseSFUqUvKNaZUJ1yCLBDCjA_I9Xbvytn3Hn2XNZXPsa5Ni7b3GSQKVKgC0n-gPJRFkwT-gYLUFDiIgF79QV9t70J53xRwFcpVgaJbKnfWe4dltnJVE-rKgGYbd1lwl23cZTt3IXK5W9wvGyx-Az-yAnCxBSpE_P3WQmqmOf8CT7aaew</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>magazinearticle</recordtype><pqid>1811362716</pqid></control><display><type>magazinearticle</type><title>Computational Synteny Block: A Framework to Identify Evolutionary Events</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Arjona-Medina, Jose A. ; Trelles, Oswaldo</creator><creatorcontrib>Arjona-Medina, Jose A. ; Trelles, Oswaldo</creatorcontrib><description>Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data sets and Supplementary Material are available at: http://bitlab-es.com/gecko-csb/.</description><identifier>ISSN: 1536-1241</identifier><identifier>EISSN: 1558-2639</identifier><identifier>DOI: 10.1109/TNB.2016.2554150</identifier><identifier>PMID: 27116744</identifier><identifier>CODEN: ITMCEL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithm design and analysis ; Algorithms ; Bioinformatics ; Breakpoints ; Collection ; computational synteny blocks ; Computer programs ; duplications ; Dynamic programming ; Evolutionary ; Fragments ; Gene loci ; Genomes ; Genomics ; Heuristic algorithms ; Identification ; Nanobioscience ; repeats ; Reproduction ; Software ; synteny blocks ; tandem repeats ; Tasks</subject><ispartof>IEEE transactions on nanobioscience, 2016-06, Vol.15 (4), p.343-353</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</citedby><cites>FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</cites><orcidid>0000-0002-5033-4725</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7457273$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>780,784,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27116744$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Arjona-Medina, Jose A.</creatorcontrib><creatorcontrib>Trelles, Oswaldo</creatorcontrib><title>Computational Synteny Block: A Framework to Identify Evolutionary Events</title><title>IEEE transactions on nanobioscience</title><addtitle>TNB</addtitle><addtitle>IEEE Trans Nanobioscience</addtitle><description>Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data sets and Supplementary Material are available at: http://bitlab-es.com/gecko-csb/.</description><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Breakpoints</subject><subject>Collection</subject><subject>computational synteny blocks</subject><subject>Computer programs</subject><subject>duplications</subject><subject>Dynamic programming</subject><subject>Evolutionary</subject><subject>Fragments</subject><subject>Gene loci</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Heuristic algorithms</subject><subject>Identification</subject><subject>Nanobioscience</subject><subject>repeats</subject><subject>Reproduction</subject><subject>Software</subject><subject>synteny blocks</subject><subject>tandem repeats</subject><subject>Tasks</subject><issn>1536-1241</issn><issn>1558-2639</issn><fulltext>true</fulltext><rsrctype>magazinearticle</rsrctype><creationdate>2016</creationdate><recordtype>magazinearticle</recordtype><recordid>eNqNkTtPwzAURi0E4lHYkZBQJBaWFF8_E7ZSUVqpgoEyR25yIwWSuNgJqP8elxYGJibb1-e7w3cIOQc6BKDpzeLxbsgoqCGTUoCke-QYpExipni6v7lzFQMTcEROvH-lFLSS6SE5YhpAaSGOyXRsm1Xfma6yramj53XbYbuO7mqbv91Go2jiTIOf1r1FnY1mBbZdVa6j-w9b998Rt3mEqT8lB6WpPZ7tzgF5mdwvxtN4_vQwG4_mcc5Z0sUslQqV1ksDhVFYQCpKIRGoSbQpc6VMwTgseSFUqUvKNaZUJ1yCLBDCjA_I9Xbvytn3Hn2XNZXPsa5Ni7b3GSQKVKgC0n-gPJRFkwT-gYLUFDiIgF79QV9t70J53xRwFcpVgaJbKnfWe4dltnJVE-rKgGYbd1lwl23cZTt3IXK5W9wvGyx-Az-yAnCxBSpE_P3WQmqmOf8CT7aaew</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Arjona-Medina, Jose A.</creator><creator>Trelles, Oswaldo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5033-4725</orcidid></search><sort><creationdate>20160601</creationdate><title>Computational Synteny Block: A Framework to Identify Evolutionary Events</title><author>Arjona-Medina, Jose A. ; Trelles, Oswaldo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3</frbrgroupid><rsrctype>magazinearticle</rsrctype><prefilter>magazinearticle</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Breakpoints</topic><topic>Collection</topic><topic>computational synteny blocks</topic><topic>Computer programs</topic><topic>duplications</topic><topic>Dynamic programming</topic><topic>Evolutionary</topic><topic>Fragments</topic><topic>Gene loci</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Heuristic algorithms</topic><topic>Identification</topic><topic>Nanobioscience</topic><topic>repeats</topic><topic>Reproduction</topic><topic>Software</topic><topic>synteny blocks</topic><topic>tandem repeats</topic><topic>Tasks</topic><toplevel>online_resources</toplevel><creatorcontrib>Arjona-Medina, Jose A.</creatorcontrib><creatorcontrib>Trelles, Oswaldo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on nanobioscience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Arjona-Medina, Jose A.</au><au>Trelles, Oswaldo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Computational Synteny Block: A Framework to Identify Evolutionary Events</atitle><jtitle>IEEE transactions on nanobioscience</jtitle><stitle>TNB</stitle><addtitle>IEEE Trans Nanobioscience</addtitle><date>2016-06-01</date><risdate>2016</risdate><volume>15</volume><issue>4</issue><spage>343</spage><epage>353</epage><pages>343-353</pages><issn>1536-1241</issn><eissn>1558-2639</eissn><coden>ITMCEL</coden><abstract>Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data sets and Supplementary Material are available at: http://bitlab-es.com/gecko-csb/.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>27116744</pmid><doi>10.1109/TNB.2016.2554150</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-5033-4725</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1536-1241
ispartof IEEE transactions on nanobioscience, 2016-06, Vol.15 (4), p.343-353
issn 1536-1241
1558-2639
language eng
recordid cdi_crossref_primary_10_1109_TNB_2016_2554150
source IEEE Electronic Library (IEL) Journals
subjects Algorithm design and analysis
Algorithms
Bioinformatics
Breakpoints
Collection
computational synteny blocks
Computer programs
duplications
Dynamic programming
Evolutionary
Fragments
Gene loci
Genomes
Genomics
Heuristic algorithms
Identification
Nanobioscience
repeats
Reproduction
Software
synteny blocks
tandem repeats
Tasks
title Computational Synteny Block: A Framework to Identify Evolutionary Events
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T00%3A12%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Computational%20Synteny%20Block:%20A%20Framework%20to%20Identify%20Evolutionary%20Events&rft.jtitle=IEEE%20transactions%20on%20nanobioscience&rft.au=Arjona-Medina,%20Jose%20A.&rft.date=2016-06-01&rft.volume=15&rft.issue=4&rft.spage=343&rft.epage=353&rft.pages=343-353&rft.issn=1536-1241&rft.eissn=1558-2639&rft.coden=ITMCEL&rft_id=info:doi/10.1109/TNB.2016.2554150&rft_dat=%3Cproquest_cross%3E1835580881%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c328t-2956e677ba1da6ed194f45e10a87afc66ad231b3d46f7f037e90783515de146f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1811362716&rft_id=info:pmid/27116744&rft_ieee_id=7457273&rfr_iscdi=true