Loading…
Alignment-free distance measure based on return time distribution for sequence analysis: Applications to clustering, molecular phylogeny and subtyping
[Display omitted] ► Novel alignment-free method is proposed and applied for molecular phylogeny. ► Method also finds its application in viral sero-/geno-typing. ► Method is robust, scalable and faster for the genomic sequence analysis. ► Enables compact representation of genomic sequences. ► Provide...
Saved in:
Published in: | Molecular phylogenetics and evolution 2012-11, Vol.65 (2), p.510-522 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | [Display omitted]
► Novel alignment-free method is proposed and applied for molecular phylogeny. ► Method also finds its application in viral sero-/geno-typing. ► Method is robust, scalable and faster for the genomic sequence analysis. ► Enables compact representation of genomic sequences. ► Provides a new way for pattern recognition and data mining.
The data deluge in post-genomic era demands development of novel data mining tools. Existing molecular phylogeny analyses (MPAs) developed for individual gene/protein sequences are alignment-based. However, the size of genomic data and uncertainties associated with alignments, necessitate development of alignment-free methods for MPA. Derivation of distances between sequences is an important step in both, alignment-dependant and alignment-free methods. Various alignment-free distance measures based on oligo-nucleotide frequencies, information content, compression techniques, etc. have been proposed. However, these distance measures do not account for relative order of components viz. nucleotides or amino acids. A new distance measure, based on the concept of ‘return time distribution’ (RTD) of k-mers is proposed, which accounts for the sequence composition and their relative orders. Statistical parameters of RTDs are used to derive a distance function. The resultant distance matrix is used for clustering and phylogeny using Neighbor-joining. Its performance for MPA and subtyping was evaluated using simulated data generated by block-bootstrap, receiver operating characteristics and leave-one-out cross validation methods. The proposed method was successfully applied for MPA of family Flaviviridae and subtyping of Dengue viruses. It is observed that method retains resolution for classification and subtyping of viruses at varying levels of sequence similarity and taxonomic hierarchy. |
---|---|
ISSN: | 1055-7903 1095-9513 |
DOI: | 10.1016/j.ympev.2012.07.003 |