Loading…

Computational Synteny Block: A Framework to Identify Evolutionary Events

Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a col...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on nanobioscience 2016-06, Vol.15 (4), p.343-353
Main Authors: Arjona-Medina, Jose A., Trelles, Oswaldo
Format: Magazinearticle
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting high-scoring segment pairs (HSPs) and gene annotations; or b) involve working at protein level (what excludes non-coding regions); or c) need many parameters to adjust the software behavior and performance; or d) imply working with duplications, repeats, and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (doubleoverlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repeats and the correct identification in turn of longer blocks. Finally, groups of repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements-taking into account repeats, tandem repeats and duplications-starting with the simple collection of ungapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow-thus outperforming current state-of-the-art software tools-and additionally allowing to classify the type of rearrangement. The results obtained are an important source of information for breakpoints refinement and featuring, as well as for the estimation of the Evolutionary Events frequencies to be used in inter-genome distance proposals, etc. Data s
ISSN:1536-1241
1558-2639
DOI:10.1109/TNB.2016.2554150