Loading…
On a kneading theory for gene-splicing
Two well-known facets in protein synthesis in eukaryotic cells are transcription of DNA to pre-RNA in the nucleus and the translation of messenger-RNA (mRNA) to proteins in the cytoplasm. A critical intermediate step is the removal of segments (introns) containing ∼ 97% of the nucleic-acid sites in...
Saved in:
Published in: | Chaos (Woodbury, N.Y.) N.Y.), 2024-04, Vol.34 (4) |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Two well-known facets in protein synthesis in eukaryotic cells are transcription of DNA to pre-RNA in the nucleus and the translation of messenger-RNA (mRNA) to proteins in the cytoplasm. A critical intermediate step is the removal of segments (introns) containing
∼
97% of the nucleic-acid sites in pre-RNA and sequential alignment of the retained segments (exons) to form mRNA through a process referred to as splicing. Alternative forms of splicing enrich the proteome while abnormal splicing can enhance the likelihood of a cell developing cancer or other diseases. Mechanisms for splicing and origins of splicing errors are only partially deciphered. Our goal is to determine if rules on splicing can be inferred from data analytics on nucleic-acid sequences. Toward that end, we represent a nucleic-acid site as a point in a plane defined in terms of the anterior and posterior sub-sequences of the site. The “point-set” representation expands analytical approaches, including the use of statistical tools, to characterize genome sequences. It is found that point-sets for exons and introns are visually different, and that the differences can be quantified using a family of generalized moments. We design a machine-learning algorithm that can recognize individual exons or introns with
91
% accuracy. Point-set distributions and generalized moments are found to differ between organisms. |
---|---|
ISSN: | 1054-1500 1089-7682 |
DOI: | 10.1063/5.0199364 |