Loading…
A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the...
Saved in:
Published in: | International journal of molecular sciences 2018-02, Vol.19 (2), p.511 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c478t-2b9dc62a659d489510a52f98093a1cf71081256d339d467a39dc5a819f308dd3 |
---|---|
cites | cdi_FETCH-LOGICAL-c478t-2b9dc62a659d489510a52f98093a1cf71081256d339d467a39dc5a819f308dd3 |
container_end_page | |
container_issue | 2 |
container_start_page | 511 |
container_title | International journal of molecular sciences |
container_volume | 19 |
creator | Pan, Gaofeng Jiang, Limin Tang, Jijun Guo, Fei |
description | DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use
-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3. |
doi_str_mv | 10.3390/ijms19020511 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_7e3ade4ee29248b496e8601f030a3a5e</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_7e3ade4ee29248b496e8601f030a3a5e</doaj_id><sourcerecordid>1999679696</sourcerecordid><originalsourceid>FETCH-LOGICAL-c478t-2b9dc62a659d489510a52f98093a1cf71081256d339d467a39dc5a819f308dd3</originalsourceid><addsrcrecordid>eNpdkktvEzEQgFcIRB9w44wsceHQgB9rr31BilIekUqp1N4tx57NOtpdB9tblAP_HZOUKuU0lufzp5nxVNUbgj8wpvBHvxkSUZhiTsiz6pTUlM4wFs3zo_NJdZbSBmPKKFcvqxOqaqIaTk-r33N0He6hR4swbKdssg-j6dF3yF1wqA0RXUIGm_24RpfX831i1-8xdOszJPTL526fuoWfE4wW0HIs74YDY0aHbrpd8jbYDgZvi_wmhi3E7CG9ql60pk_w-iGeV3dfPt8tvs2ufnxdLuZXM1s3Ms_oSjkrqBFcuVoqTrDhtFUSK2aIbRuCJaFcuDIPV4vGlGC5kUS1DEvn2Hm1PGhdMBu9jX4wcaeD8Xp_EeJam1KP7UE3wIyDGoAqWstVrQRIgUmLGTbMcCiuTwfXdloN4CyMOZr-ifRpZvSdXod7zSXnDWNF8P5BEEMZWMp68MlC35sRwpQ0UUqJRgklCvruP3QTplj-J2mKSd0oRYQs1MWBsjGkFKF9LIZg_XdF9PGKFPztcQOP8L-dYH8ABw639A</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2014799168</pqid></control><display><type>article</type><title>A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Pan, Gaofeng ; Jiang, Limin ; Tang, Jijun ; Guo, Fei</creator><creatorcontrib>Pan, Gaofeng ; Jiang, Limin ; Tang, Jijun ; Guo, Fei</creatorcontrib><description>DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use
-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.</description><identifier>ISSN: 1422-0067</identifier><identifier>ISSN: 1661-6596</identifier><identifier>EISSN: 1422-0067</identifier><identifier>DOI: 10.3390/ijms19020511</identifier><identifier>PMID: 29419752</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Amino acid composition ; Bayesian analysis ; Biochemistry ; Cancer ; Computation ; Computer applications ; Correlation coefficients ; Deoxyribonucleic acid ; Discrete Wavelet Transform ; DNA ; DNA methylation ; Embryo cells ; feature selection ; k-gram ; Learning algorithms ; Methods ; multivariate mutual information ; Nucleotide sequence ; Physicochemical properties ; PseAAC ; scBS-seq profiled mouse embryonic stem cells ; Sensitivity analysis ; Sequences ; Sparse Bayesian learning ; Stem cell transplantation ; Stem cells ; support vector machine</subject><ispartof>International journal of molecular sciences, 2018-02, Vol.19 (2), p.511</ispartof><rights>Copyright MDPI AG 2018</rights><rights>2018 by the authors. 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c478t-2b9dc62a659d489510a52f98093a1cf71081256d339d467a39dc5a819f308dd3</citedby><cites>FETCH-LOGICAL-c478t-2b9dc62a659d489510a52f98093a1cf71081256d339d467a39dc5a819f308dd3</cites><orcidid>0000-0003-0545-2754</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2014799168/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2014799168?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29419752$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Pan, Gaofeng</creatorcontrib><creatorcontrib>Jiang, Limin</creatorcontrib><creatorcontrib>Tang, Jijun</creatorcontrib><creatorcontrib>Guo, Fei</creatorcontrib><title>A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties</title><title>International journal of molecular sciences</title><addtitle>Int J Mol Sci</addtitle><description>DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use
-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.</description><subject>Amino acid composition</subject><subject>Bayesian analysis</subject><subject>Biochemistry</subject><subject>Cancer</subject><subject>Computation</subject><subject>Computer applications</subject><subject>Correlation coefficients</subject><subject>Deoxyribonucleic acid</subject><subject>Discrete Wavelet Transform</subject><subject>DNA</subject><subject>DNA methylation</subject><subject>Embryo cells</subject><subject>feature selection</subject><subject>k-gram</subject><subject>Learning algorithms</subject><subject>Methods</subject><subject>multivariate mutual information</subject><subject>Nucleotide sequence</subject><subject>Physicochemical properties</subject><subject>PseAAC</subject><subject>scBS-seq profiled mouse embryonic stem cells</subject><subject>Sensitivity analysis</subject><subject>Sequences</subject><subject>Sparse Bayesian learning</subject><subject>Stem cell transplantation</subject><subject>Stem cells</subject><subject>support vector machine</subject><issn>1422-0067</issn><issn>1661-6596</issn><issn>1422-0067</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpdkktvEzEQgFcIRB9w44wsceHQgB9rr31BilIekUqp1N4tx57NOtpdB9tblAP_HZOUKuU0lufzp5nxVNUbgj8wpvBHvxkSUZhiTsiz6pTUlM4wFs3zo_NJdZbSBmPKKFcvqxOqaqIaTk-r33N0He6hR4swbKdssg-j6dF3yF1wqA0RXUIGm_24RpfX831i1-8xdOszJPTL526fuoWfE4wW0HIs74YDY0aHbrpd8jbYDgZvi_wmhi3E7CG9ql60pk_w-iGeV3dfPt8tvs2ufnxdLuZXM1s3Ms_oSjkrqBFcuVoqTrDhtFUSK2aIbRuCJaFcuDIPV4vGlGC5kUS1DEvn2Hm1PGhdMBu9jX4wcaeD8Xp_EeJam1KP7UE3wIyDGoAqWstVrQRIgUmLGTbMcCiuTwfXdloN4CyMOZr-ifRpZvSdXod7zSXnDWNF8P5BEEMZWMp68MlC35sRwpQ0UUqJRgklCvruP3QTplj-J2mKSd0oRYQs1MWBsjGkFKF9LIZg_XdF9PGKFPztcQOP8L-dYH8ABw639A</recordid><startdate>20180208</startdate><enddate>20180208</enddate><creator>Pan, Gaofeng</creator><creator>Jiang, Limin</creator><creator>Tang, Jijun</creator><creator>Guo, Fei</creator><general>MDPI AG</general><general>MDPI</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>MBDVC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-0545-2754</orcidid></search><sort><creationdate>20180208</creationdate><title>A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties</title><author>Pan, Gaofeng ; Jiang, Limin ; Tang, Jijun ; Guo, Fei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c478t-2b9dc62a659d489510a52f98093a1cf71081256d339d467a39dc5a819f308dd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Amino acid composition</topic><topic>Bayesian analysis</topic><topic>Biochemistry</topic><topic>Cancer</topic><topic>Computation</topic><topic>Computer applications</topic><topic>Correlation coefficients</topic><topic>Deoxyribonucleic acid</topic><topic>Discrete Wavelet Transform</topic><topic>DNA</topic><topic>DNA methylation</topic><topic>Embryo cells</topic><topic>feature selection</topic><topic>k-gram</topic><topic>Learning algorithms</topic><topic>Methods</topic><topic>multivariate mutual information</topic><topic>Nucleotide sequence</topic><topic>Physicochemical properties</topic><topic>PseAAC</topic><topic>scBS-seq profiled mouse embryonic stem cells</topic><topic>Sensitivity analysis</topic><topic>Sequences</topic><topic>Sparse Bayesian learning</topic><topic>Stem cell transplantation</topic><topic>Stem cells</topic><topic>support vector machine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pan, Gaofeng</creatorcontrib><creatorcontrib>Jiang, Limin</creatorcontrib><creatorcontrib>Tang, Jijun</creatorcontrib><creatorcontrib>Guo, Fei</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>International journal of molecular sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pan, Gaofeng</au><au>Jiang, Limin</au><au>Tang, Jijun</au><au>Guo, Fei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties</atitle><jtitle>International journal of molecular sciences</jtitle><addtitle>Int J Mol Sci</addtitle><date>2018-02-08</date><risdate>2018</risdate><volume>19</volume><issue>2</issue><spage>511</spage><pages>511-</pages><issn>1422-0067</issn><issn>1661-6596</issn><eissn>1422-0067</eissn><abstract>DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use
-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>29419752</pmid><doi>10.3390/ijms19020511</doi><orcidid>https://orcid.org/0000-0003-0545-2754</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1422-0067 |
ispartof | International journal of molecular sciences, 2018-02, Vol.19 (2), p.511 |
issn | 1422-0067 1661-6596 1422-0067 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_7e3ade4ee29248b496e8601f030a3a5e |
source | Publicly Available Content Database; PubMed Central |
subjects | Amino acid composition Bayesian analysis Biochemistry Cancer Computation Computer applications Correlation coefficients Deoxyribonucleic acid Discrete Wavelet Transform DNA DNA methylation Embryo cells feature selection k-gram Learning algorithms Methods multivariate mutual information Nucleotide sequence Physicochemical properties PseAAC scBS-seq profiled mouse embryonic stem cells Sensitivity analysis Sequences Sparse Bayesian learning Stem cell transplantation Stem cells support vector machine |
title | A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T00%3A59%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Novel%20Computational%20Method%20for%20Detecting%20DNA%20Methylation%20Sites%20with%20DNA%20Sequence%20Information%20and%20Physicochemical%20Properties&rft.jtitle=International%20journal%20of%20molecular%20sciences&rft.au=Pan,%20Gaofeng&rft.date=2018-02-08&rft.volume=19&rft.issue=2&rft.spage=511&rft.pages=511-&rft.issn=1422-0067&rft.eissn=1422-0067&rft_id=info:doi/10.3390/ijms19020511&rft_dat=%3Cproquest_doaj_%3E1999679696%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c478t-2b9dc62a659d489510a52f98093a1cf71081256d339d467a39dc5a819f308dd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2014799168&rft_id=info:pmid/29419752&rfr_iscdi=true |