Loading…

k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification

In this paper, a computational method based on machine learning technique for identifying Alzheimer's disease genes is proposed. Compared with most existing machine learning based methods, existing methods predict Alzheimer's disease genes by using structural magnetic resonance imaging (MR...

Full description

Saved in:
Bibliographic Details
Published in:Frontiers in genetics 2019-02, Vol.10, p.33
Main Authors: Xu, Lei, Liang, Guangmin, Liao, Changrui, Chen, Gin-Den, Chang, Chi-Chang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c462t-412332cf3de295fbfd0db42bc3e8c57ec8e922bd131ec8f21f447c9508cbc1773
cites cdi_FETCH-LOGICAL-c462t-412332cf3de295fbfd0db42bc3e8c57ec8e922bd131ec8f21f447c9508cbc1773
container_end_page
container_issue
container_start_page 33
container_title Frontiers in genetics
container_volume 10
creator Xu, Lei
Liang, Guangmin
Liao, Changrui
Chen, Gin-Den
Chang, Chi-Chang
description In this paper, a computational method based on machine learning technique for identifying Alzheimer's disease genes is proposed. Compared with most existing machine learning based methods, existing methods predict Alzheimer's disease genes by using structural magnetic resonance imaging (MRI) technique. Most methods have attained acceptable results, but the cost is expensive and time consuming. Thus, we proposed a computational method for identifying Alzheimer disease genes by use of the sequence information of proteins, and classify the feature vectors by random forest. In the proposed method, the gene protein information is extracted by adaptive k-skip-n-gram features. The proposed method can attain the accuracy to 85.5% on the selected UniProt dataset, which has been demonstrated by the experimental results.
doi_str_mv 10.3389/fgene.2019.00033
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_ddcfeadb7b014ec49d1b132d4527aa0e</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_ddcfeadb7b014ec49d1b132d4527aa0e</doaj_id><sourcerecordid>2186625517</sourcerecordid><originalsourceid>FETCH-LOGICAL-c462t-412332cf3de295fbfd0db42bc3e8c57ec8e922bd131ec8f21f447c9508cbc1773</originalsourceid><addsrcrecordid>eNpVkUtPGzEQgK2KqiDKvafKN7hs8Gsf7gEpUEIjUbWi7bGy_Bgnht11am-Q4NezSSiCk0eemW_G_hD6RMmE80ae-gX0MGGEygkhhPN36IBWlSgawujeq3gfHeV8O5YQITnn4gPa56Qhkgl2gP7eFb_uwqroi6uku-Jm9gVP8Y3uXezwLCbIAz7XGRz-DsMyOuxjwtP2cQmhg3Sc8deQYczjnykOEHo8d9APwQerhxD7j-i9122Go-fzEP2ZXf6--FZc_7iaX0yvCysqNhSCMs6Z9dwBk6U33hFnBDOWQ2PLGmwDkjHjKKdj7Bn1QtRWlqSxxtK65odovuO6qG_VKoVOpwcVdVDbi5gWSqch2BaUc9aDdqY2hAqwQjpqKGdOlKzWmsDIOtuxVmvTgbPje5Ju30DfZvqwVIt4rypeS1HSEXDyDEjx33r8QdWFbKFtdQ9xnRWjTVWxsqSbvcmu1KaYcwL_MoYStZGstpLVRrLaSh5bPr9e76Xhv1L-BBOhpCM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2186625517</pqid></control><display><type>article</type><title>k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification</title><source>Open Access: PubMed Central</source><creator>Xu, Lei ; Liang, Guangmin ; Liao, Changrui ; Chen, Gin-Den ; Chang, Chi-Chang</creator><creatorcontrib>Xu, Lei ; Liang, Guangmin ; Liao, Changrui ; Chen, Gin-Den ; Chang, Chi-Chang</creatorcontrib><description>In this paper, a computational method based on machine learning technique for identifying Alzheimer's disease genes is proposed. Compared with most existing machine learning based methods, existing methods predict Alzheimer's disease genes by using structural magnetic resonance imaging (MRI) technique. Most methods have attained acceptable results, but the cost is expensive and time consuming. Thus, we proposed a computational method for identifying Alzheimer disease genes by use of the sequence information of proteins, and classify the feature vectors by random forest. In the proposed method, the gene protein information is extracted by adaptive k-skip-n-gram features. The proposed method can attain the accuracy to 85.5% on the selected UniProt dataset, which has been demonstrated by the experimental results.</description><identifier>ISSN: 1664-8021</identifier><identifier>EISSN: 1664-8021</identifier><identifier>DOI: 10.3389/fgene.2019.00033</identifier><identifier>PMID: 30809242</identifier><language>eng</language><publisher>Switzerland: Frontiers Media S.A</publisher><subject>Alzheimer's disease ; gene coding ; Genetics ; n-gram model ; random forest ; sequence information</subject><ispartof>Frontiers in genetics, 2019-02, Vol.10, p.33</ispartof><rights>Copyright © 2019 Xu, Liang, Liao, Chen and Chang. 2019 Xu, Liang, Liao, Chen and Chang</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c462t-412332cf3de295fbfd0db42bc3e8c57ec8e922bd131ec8f21f447c9508cbc1773</citedby><cites>FETCH-LOGICAL-c462t-412332cf3de295fbfd0db42bc3e8c57ec8e922bd131ec8f21f447c9508cbc1773</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6379451/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6379451/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30809242$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Lei</creatorcontrib><creatorcontrib>Liang, Guangmin</creatorcontrib><creatorcontrib>Liao, Changrui</creatorcontrib><creatorcontrib>Chen, Gin-Den</creatorcontrib><creatorcontrib>Chang, Chi-Chang</creatorcontrib><title>k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification</title><title>Frontiers in genetics</title><addtitle>Front Genet</addtitle><description>In this paper, a computational method based on machine learning technique for identifying Alzheimer's disease genes is proposed. Compared with most existing machine learning based methods, existing methods predict Alzheimer's disease genes by using structural magnetic resonance imaging (MRI) technique. Most methods have attained acceptable results, but the cost is expensive and time consuming. Thus, we proposed a computational method for identifying Alzheimer disease genes by use of the sequence information of proteins, and classify the feature vectors by random forest. In the proposed method, the gene protein information is extracted by adaptive k-skip-n-gram features. The proposed method can attain the accuracy to 85.5% on the selected UniProt dataset, which has been demonstrated by the experimental results.</description><subject>Alzheimer's disease</subject><subject>gene coding</subject><subject>Genetics</subject><subject>n-gram model</subject><subject>random forest</subject><subject>sequence information</subject><issn>1664-8021</issn><issn>1664-8021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNpVkUtPGzEQgK2KqiDKvafKN7hs8Gsf7gEpUEIjUbWi7bGy_Bgnht11am-Q4NezSSiCk0eemW_G_hD6RMmE80ae-gX0MGGEygkhhPN36IBWlSgawujeq3gfHeV8O5YQITnn4gPa56Qhkgl2gP7eFb_uwqroi6uku-Jm9gVP8Y3uXezwLCbIAz7XGRz-DsMyOuxjwtP2cQmhg3Sc8deQYczjnykOEHo8d9APwQerhxD7j-i9122Go-fzEP2ZXf6--FZc_7iaX0yvCysqNhSCMs6Z9dwBk6U33hFnBDOWQ2PLGmwDkjHjKKdj7Bn1QtRWlqSxxtK65odovuO6qG_VKoVOpwcVdVDbi5gWSqch2BaUc9aDdqY2hAqwQjpqKGdOlKzWmsDIOtuxVmvTgbPje5Ju30DfZvqwVIt4rypeS1HSEXDyDEjx33r8QdWFbKFtdQ9xnRWjTVWxsqSbvcmu1KaYcwL_MoYStZGstpLVRrLaSh5bPr9e76Xhv1L-BBOhpCM</recordid><startdate>20190212</startdate><enddate>20190212</enddate><creator>Xu, Lei</creator><creator>Liang, Guangmin</creator><creator>Liao, Changrui</creator><creator>Chen, Gin-Den</creator><creator>Chang, Chi-Chang</creator><general>Frontiers Media S.A</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20190212</creationdate><title>k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification</title><author>Xu, Lei ; Liang, Guangmin ; Liao, Changrui ; Chen, Gin-Den ; Chang, Chi-Chang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c462t-412332cf3de295fbfd0db42bc3e8c57ec8e922bd131ec8f21f447c9508cbc1773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Alzheimer's disease</topic><topic>gene coding</topic><topic>Genetics</topic><topic>n-gram model</topic><topic>random forest</topic><topic>sequence information</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xu, Lei</creatorcontrib><creatorcontrib>Liang, Guangmin</creatorcontrib><creatorcontrib>Liao, Changrui</creatorcontrib><creatorcontrib>Chen, Gin-Den</creatorcontrib><creatorcontrib>Chang, Chi-Chang</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Frontiers in genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Lei</au><au>Liang, Guangmin</au><au>Liao, Changrui</au><au>Chen, Gin-Den</au><au>Chang, Chi-Chang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification</atitle><jtitle>Frontiers in genetics</jtitle><addtitle>Front Genet</addtitle><date>2019-02-12</date><risdate>2019</risdate><volume>10</volume><spage>33</spage><pages>33-</pages><issn>1664-8021</issn><eissn>1664-8021</eissn><abstract>In this paper, a computational method based on machine learning technique for identifying Alzheimer's disease genes is proposed. Compared with most existing machine learning based methods, existing methods predict Alzheimer's disease genes by using structural magnetic resonance imaging (MRI) technique. Most methods have attained acceptable results, but the cost is expensive and time consuming. Thus, we proposed a computational method for identifying Alzheimer disease genes by use of the sequence information of proteins, and classify the feature vectors by random forest. In the proposed method, the gene protein information is extracted by adaptive k-skip-n-gram features. The proposed method can attain the accuracy to 85.5% on the selected UniProt dataset, which has been demonstrated by the experimental results.</abstract><cop>Switzerland</cop><pub>Frontiers Media S.A</pub><pmid>30809242</pmid><doi>10.3389/fgene.2019.00033</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1664-8021
ispartof Frontiers in genetics, 2019-02, Vol.10, p.33
issn 1664-8021
1664-8021
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_ddcfeadb7b014ec49d1b132d4527aa0e
source Open Access: PubMed Central
subjects Alzheimer's disease
gene coding
Genetics
n-gram model
random forest
sequence information
title k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T03%3A02%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=k-Skip-n-Gram-RF:%20A%20Random%20Forest%20Based%20Method%20for%20Alzheimer's%20Disease%20Protein%20Identification&rft.jtitle=Frontiers%20in%20genetics&rft.au=Xu,%20Lei&rft.date=2019-02-12&rft.volume=10&rft.spage=33&rft.pages=33-&rft.issn=1664-8021&rft.eissn=1664-8021&rft_id=info:doi/10.3389/fgene.2019.00033&rft_dat=%3Cproquest_doaj_%3E2186625517%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c462t-412332cf3de295fbfd0db42bc3e8c57ec8e922bd131ec8f21f447c9508cbc1773%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2186625517&rft_id=info:pmid/30809242&rfr_iscdi=true