Loading…

A novel statistical measure for sequence comparison on the basis of k-word counts

Numerous efficient methods based on word counts for sequence analysis have been proposed to characterize DNA sequences to help in comparison, retrieval from the databases and reconstructing evolutionary relations. However, most of them seem unrelated to any intrinsic characteristics of DNA. In this...

Full description

Saved in:
Bibliographic Details
Published in:Journal of theoretical biology 2013-02, Vol.318, p.91-100
Main Authors: Yang, Xiwu, Wang, Tianming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Numerous efficient methods based on word counts for sequence analysis have been proposed to characterize DNA sequences to help in comparison, retrieval from the databases and reconstructing evolutionary relations. However, most of them seem unrelated to any intrinsic characteristics of DNA. In this paper, we proposed a novel statistical measure for sequence comparison on the basis of k-word counts. This new measure removed the influence of sequences’ lengths and uncovered bulk property of DNA sequences. The proposed measure was tested by similarity search and phylogenetic analysis. The experimental assessment demonstrated that our similarity measure was efficient. ► The increasing amount of gene sequences calls for efficient computational methods. ► Most of the methods view word frequencies as discrete units separately. ► We focus on correlations and bulk property of k-word. ► A new measure is proposed for sequence comparison on the basis of k-word counts. ► The experimental assessment demonstrated that our similarity measure is efficient.
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2012.10.035