Loading…

Numeric promoter description – A comparative view on concepts and general application

[Display omitted] •We compared a set of different nucleic acid description concepts.•We evaluated the plausibility of the preferred generated model.•The major part of descriptive power is attached to positional information.•Physico-chemical information explicitly merged into descriptors is of minor...

Full description

Saved in:
Bibliographic Details
Published in:Journal of molecular graphics & modelling 2016-01, Vol.63, p.65-77
Main Authors: Beier, Rico, Labudde, Dirk
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •We compared a set of different nucleic acid description concepts.•We evaluated the plausibility of the preferred generated model.•The major part of descriptive power is attached to positional information.•Physico-chemical information explicitly merged into descriptors is of minor effect.•This principle can be transferred to other kinds of functional nucleic acids. Nucleic acid molecules play a key role in a variety of biological processes. Starting from storage and transfer tasks, this also comprises the triggering of biological processes, regulatory effects and the active influence gained by target binding. Based on the experimental output (in this case promoter sequences), further in silico analyses aid in gaining new insights into these processes and interactions. The numerical description of nucleic acids thereby constitutes a bridge between the concrete biological issues and the analytical methods. Hence, this study compares 26 descriptor sets obtained by applying well-known numerical description concepts to an established dataset of 38 DNA promoter sequences. The suitability of the description sets was evaluated by computing partial least squares regression models and assessing the model accuracy. We conclude that the major importance regarding the descriptive power is attached to positional information rather than to explicitly incorporated physico-chemical information, since a sufficient amount of implicit physico-chemical information is already encoded in the nucleobase classification. The regression models especially benefited from employing the information that is encoded in the sequential and structural neighborhood of the nucleobases. Thus, the analyses of n-grams (short fragments of length n) suggested that they are valuable descriptors for DNA target interactions. A mixed n-gram descriptor set thereby yielded the best description of the promoter sequences. The corresponding regression model was checked and found to be plausible as it was able to reproduce the characteristic binding motifs of promoter sequences in a reasonable degree. As most functional nucleic acids are based on the principle of molecular recognition, the findings are not restricted to promoter sequences, but can rather be transferred to other kinds of functional nucleic acids. Thus, the concepts presented in this study could provide advantages for future nucleic acid-based technologies, like biosensoring, therapeutics and molecular imaging.
ISSN:1093-3263
1873-4243
DOI:10.1016/j.jmgm.2015.11.011