Loading…

Quantum mechanical electronic and geometric parameters for DNA k-mers as features for machine learning

We are witnessing a steep increase in model development initiatives in genomics that employ high-end machine learning methodologies. Of particular interest are models that predict certain genomic characteristics based solely on DNA sequence. These models, however, treat the DNA as a mere collection...

Full description

Saved in:
Bibliographic Details
Published in:Scientific data 2024-08, Vol.11 (1), p.911-9, Article 911
Main Authors: Masuda, Kairi, Abdullah, Adib A., Pflughaupt, Patrick, Sahakyan, Aleksandr B.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We are witnessing a steep increase in model development initiatives in genomics that employ high-end machine learning methodologies. Of particular interest are models that predict certain genomic characteristics based solely on DNA sequence. These models, however, treat the DNA as a mere collection of four, A, T, G and C, letters, dismissing the past advancements in science that can enable the use of more intricate information from nucleic acid sequences. Here, we provide a comprehensive database of quantum mechanical (QM) and geometric features for all the permutations of 7-meric DNA in their representative B, A and Z conformations. The database is generated by employing the applicable high-cost and time-consuming QM methodologies. This can thus make it seamless to associate a wealth of novel molecular features to any DNA sequence, by scanning it with a matching k-meric window and pulling the pre-computed values from our database for further use in modelling. We demonstrate the usefulness of our deposited features through their exclusive use in developing a model for A->C mutation rates.
ISSN:2052-4463
2052-4463
DOI:10.1038/s41597-024-03772-5