Loading…
The breakdown of the word symmetry in the human genome
Previous studies have suggested that Chargaff's second rule may hold for relatively long words (above 10nucleotides), but this has not been conclusively shown. In particular, the following questions remain open: Is the phenomenon of symmetry statistically significant? If so, what is the word le...
Saved in:
Published in: | Journal of theoretical biology 2013-10, Vol.335, p.153-159 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Previous studies have suggested that Chargaff's second rule may hold for relatively long words (above 10nucleotides), but this has not been conclusively shown. In particular, the following questions remain open: Is the phenomenon of symmetry statistically significant? If so, what is the word length above which significance is lost? Can deviations in symmetry due to the finite size of the data be identified? This work addresses these questions by studying word symmetries in the human genome, chromosomes and transcriptome. To rule out finite-length effects, the results are compared with those obtained from random control sequences built to satisfy Chargaff's second parity rule. We use several techniques to evaluate the phenomenon of symmetry, including Pearson's correlation coefficient, total variational distance, a novel word symmetry distance, as well as traditional and equivalence statistical tests. We conclude that word symmetries are statistical significant in the human genome for word lengths up to 6nucleotides. For longer words, we present evidence that the phenomenon may not be as prevalent as previously thought. |
---|---|
ISSN: | 0022-5193 1095-8541 |
DOI: | 10.1016/j.jtbi.2013.06.032 |