Loading…

The breakdown of the word symmetry in the human genome

Previous studies have suggested that Chargaff's second rule may hold for relatively long words (above 10nucleotides), but this has not been conclusively shown. In particular, the following questions remain open: Is the phenomenon of symmetry statistically significant? If so, what is the word le...

Full description

Saved in:
Bibliographic Details
Published in:Journal of theoretical biology 2013-10, Vol.335, p.153-159
Main Authors: Afreixo, Vera, Bastos, Carlos A.C., Garcia, Sara P., Rodrigues, João M.O.S., Pinho, Armando J., Ferreira, Paulo J.S.G.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Previous studies have suggested that Chargaff's second rule may hold for relatively long words (above 10nucleotides), but this has not been conclusively shown. In particular, the following questions remain open: Is the phenomenon of symmetry statistically significant? If so, what is the word length above which significance is lost? Can deviations in symmetry due to the finite size of the data be identified? This work addresses these questions by studying word symmetries in the human genome, chromosomes and transcriptome. To rule out finite-length effects, the results are compared with those obtained from random control sequences built to satisfy Chargaff's second parity rule. We use several techniques to evaluate the phenomenon of symmetry, including Pearson's correlation coefficient, total variational distance, a novel word symmetry distance, as well as traditional and equivalence statistical tests. We conclude that word symmetries are statistical significant in the human genome for word lengths up to 6nucleotides. For longer words, we present evidence that the phenomenon may not be as prevalent as previously thought.
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2013.06.032