Loading…
Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method
Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the k-nearest neighbor ( k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the...
Saved in:
Published in: | BioSystems 2007-09, Vol.90 (2), p.405-413 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c372t-b6cc680d3fb0f1718dd7f8bd3e687ce8564fdfdafafaec5e762e895fab57cfa63 |
---|---|
cites | cdi_FETCH-LOGICAL-c372t-b6cc680d3fb0f1718dd7f8bd3e687ce8564fdfdafafaec5e762e895fab57cfa63 |
container_end_page | 413 |
container_issue | 2 |
container_start_page | 405 |
container_title | BioSystems |
container_volume | 90 |
creator | Huang, Wen-Lin Chen, Hung-Ming Hwang, Shiow-Fen Ho, Shinn-Ying |
description | Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the
k-nearest neighbor (
k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap problem, this study proposes an efficient non-parametric classifier for predicting enzyme subfamily class using an adaptive fuzzy
r-nearest neighbor (AFK-NN) method, where
k and a fuzzy strength parameter
m are adaptively specified. The fuzzy membership values of a query sample
Q are dynamically determined according to the position of
Q and its weighted distances to the
k nearest neighbors. Using the same enzymes of the oxidoreductases family for comparisons, the prediction accuracy of AFK-NN is 76.6%, which is better than those of Support Vector Machine (73.6%), the decision tree method C5.0 (75.4%) and the existing covariant-discriminate algorithm (70.6%) using a jackknife test. To evaluate the generalization ability of AFK-NN, the datasets for all six families of entirely sequenced enzymes are established from the newly updated SWISS-PROT and ENZYME database. The accuracy of AFK-NN on the new large-scale dataset of oxidoreductases family is 83.3%, and the mean accuracy of the six families is 92.1%. |
doi_str_mv | 10.1016/j.biosystems.2006.10.004 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68430080</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0303264706002085</els_id><sourcerecordid>68430080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-b6cc680d3fb0f1718dd7f8bd3e687ce8564fdfdafafaec5e762e895fab57cfa63</originalsourceid><addsrcrecordid>eNqFkMlOwzAQhi0EoqXwCsgnbinOZptjqdikSlzgbDn2uHVJ4mAnldKnx1Ur9cjMYaSZf7YPIZySeUpS-ridV9aFMfTQhHlGCI3pOSHFBZqmnGUJz7PiEk1JTvIkowWboJsQtiRaydNrNElZWhCWlVMkFkoNXvaAOw_aqt66FjuDod2PDeAwVEY2th6xqmUIeAi2XWPZYqll19sdYDPs9yP-SVqQHkKPW7DrTeU8bqDfOH2LroysA9yd4gx9v758Ld-T1efbx3KxSlTOsj6pqFKUE52biph4HdeaGV7pHChnCnhJC6ONliY6qBIYzYA_lUZWJVNG0nyGHo5zO-9-h3iIaGxQUNeyBTcEQXmRE8JJFPKjUHkXggcjOm8b6UeREnGAK7biDFcc4B4qEW5svT_tGKoG9LnxRDMKno8CiJ_uLHgRlIVWRbAeVC-0s_9v-QM6GpRA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>68430080</pqid></control><display><type>article</type><title>Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method</title><source>ScienceDirect Freedom Collection</source><creator>Huang, Wen-Lin ; Chen, Hung-Ming ; Hwang, Shiow-Fen ; Ho, Shinn-Ying</creator><creatorcontrib>Huang, Wen-Lin ; Chen, Hung-Ming ; Hwang, Shiow-Fen ; Ho, Shinn-Ying</creatorcontrib><description>Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the
k-nearest neighbor (
k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap problem, this study proposes an efficient non-parametric classifier for predicting enzyme subfamily class using an adaptive fuzzy
r-nearest neighbor (AFK-NN) method, where
k and a fuzzy strength parameter
m are adaptively specified. The fuzzy membership values of a query sample
Q are dynamically determined according to the position of
Q and its weighted distances to the
k nearest neighbors. Using the same enzymes of the oxidoreductases family for comparisons, the prediction accuracy of AFK-NN is 76.6%, which is better than those of Support Vector Machine (73.6%), the decision tree method C5.0 (75.4%) and the existing covariant-discriminate algorithm (70.6%) using a jackknife test. To evaluate the generalization ability of AFK-NN, the datasets for all six families of entirely sequenced enzymes are established from the newly updated SWISS-PROT and ENZYME database. The accuracy of AFK-NN on the new large-scale dataset of oxidoreductases family is 83.3%, and the mean accuracy of the six families is 92.1%.</description><identifier>ISSN: 0303-2647</identifier><identifier>EISSN: 1872-8324</identifier><identifier>DOI: 10.1016/j.biosystems.2006.10.004</identifier><identifier>PMID: 17140725</identifier><language>eng</language><publisher>Ireland: Elsevier Ireland Ltd</publisher><subject>Algorithms ; Amino acid composition ; Amino Acids - chemistry ; Animals ; Computer Simulation ; Databases, Protein ; Enzyme subfamily class prediction ; Enzymes - chemistry ; Fuzzy Logic ; Fuzzy theory ; Genetic Vectors ; k-Nearest neighbor ; Models, Statistical ; Models, Theoretical ; Oxidoreductases - genetics ; Reproducibility of Results ; Sequence Alignment ; Sequence Analysis, Protein ; Support vector machine ; Systems Biology</subject><ispartof>BioSystems, 2007-09, Vol.90 (2), p.405-413</ispartof><rights>2006 Elsevier Ireland Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-b6cc680d3fb0f1718dd7f8bd3e687ce8564fdfdafafaec5e762e895fab57cfa63</citedby><cites>FETCH-LOGICAL-c372t-b6cc680d3fb0f1718dd7f8bd3e687ce8564fdfdafafaec5e762e895fab57cfa63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/17140725$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Huang, Wen-Lin</creatorcontrib><creatorcontrib>Chen, Hung-Ming</creatorcontrib><creatorcontrib>Hwang, Shiow-Fen</creatorcontrib><creatorcontrib>Ho, Shinn-Ying</creatorcontrib><title>Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method</title><title>BioSystems</title><addtitle>Biosystems</addtitle><description>Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the
k-nearest neighbor (
k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap problem, this study proposes an efficient non-parametric classifier for predicting enzyme subfamily class using an adaptive fuzzy
r-nearest neighbor (AFK-NN) method, where
k and a fuzzy strength parameter
m are adaptively specified. The fuzzy membership values of a query sample
Q are dynamically determined according to the position of
Q and its weighted distances to the
k nearest neighbors. Using the same enzymes of the oxidoreductases family for comparisons, the prediction accuracy of AFK-NN is 76.6%, which is better than those of Support Vector Machine (73.6%), the decision tree method C5.0 (75.4%) and the existing covariant-discriminate algorithm (70.6%) using a jackknife test. To evaluate the generalization ability of AFK-NN, the datasets for all six families of entirely sequenced enzymes are established from the newly updated SWISS-PROT and ENZYME database. The accuracy of AFK-NN on the new large-scale dataset of oxidoreductases family is 83.3%, and the mean accuracy of the six families is 92.1%.</description><subject>Algorithms</subject><subject>Amino acid composition</subject><subject>Amino Acids - chemistry</subject><subject>Animals</subject><subject>Computer Simulation</subject><subject>Databases, Protein</subject><subject>Enzyme subfamily class prediction</subject><subject>Enzymes - chemistry</subject><subject>Fuzzy Logic</subject><subject>Fuzzy theory</subject><subject>Genetic Vectors</subject><subject>k-Nearest neighbor</subject><subject>Models, Statistical</subject><subject>Models, Theoretical</subject><subject>Oxidoreductases - genetics</subject><subject>Reproducibility of Results</subject><subject>Sequence Alignment</subject><subject>Sequence Analysis, Protein</subject><subject>Support vector machine</subject><subject>Systems Biology</subject><issn>0303-2647</issn><issn>1872-8324</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><recordid>eNqFkMlOwzAQhi0EoqXwCsgnbinOZptjqdikSlzgbDn2uHVJ4mAnldKnx1Ur9cjMYaSZf7YPIZySeUpS-ridV9aFMfTQhHlGCI3pOSHFBZqmnGUJz7PiEk1JTvIkowWboJsQtiRaydNrNElZWhCWlVMkFkoNXvaAOw_aqt66FjuDod2PDeAwVEY2th6xqmUIeAi2XWPZYqll19sdYDPs9yP-SVqQHkKPW7DrTeU8bqDfOH2LroysA9yd4gx9v758Ld-T1efbx3KxSlTOsj6pqFKUE52biph4HdeaGV7pHChnCnhJC6ONliY6qBIYzYA_lUZWJVNG0nyGHo5zO-9-h3iIaGxQUNeyBTcEQXmRE8JJFPKjUHkXggcjOm8b6UeREnGAK7biDFcc4B4qEW5svT_tGKoG9LnxRDMKno8CiJ_uLHgRlIVWRbAeVC-0s_9v-QM6GpRA</recordid><startdate>20070901</startdate><enddate>20070901</enddate><creator>Huang, Wen-Lin</creator><creator>Chen, Hung-Ming</creator><creator>Hwang, Shiow-Fen</creator><creator>Ho, Shinn-Ying</creator><general>Elsevier Ireland Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20070901</creationdate><title>Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method</title><author>Huang, Wen-Lin ; Chen, Hung-Ming ; Hwang, Shiow-Fen ; Ho, Shinn-Ying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-b6cc680d3fb0f1718dd7f8bd3e687ce8564fdfdafafaec5e762e895fab57cfa63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Algorithms</topic><topic>Amino acid composition</topic><topic>Amino Acids - chemistry</topic><topic>Animals</topic><topic>Computer Simulation</topic><topic>Databases, Protein</topic><topic>Enzyme subfamily class prediction</topic><topic>Enzymes - chemistry</topic><topic>Fuzzy Logic</topic><topic>Fuzzy theory</topic><topic>Genetic Vectors</topic><topic>k-Nearest neighbor</topic><topic>Models, Statistical</topic><topic>Models, Theoretical</topic><topic>Oxidoreductases - genetics</topic><topic>Reproducibility of Results</topic><topic>Sequence Alignment</topic><topic>Sequence Analysis, Protein</topic><topic>Support vector machine</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Wen-Lin</creatorcontrib><creatorcontrib>Chen, Hung-Ming</creatorcontrib><creatorcontrib>Hwang, Shiow-Fen</creatorcontrib><creatorcontrib>Ho, Shinn-Ying</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>BioSystems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Wen-Lin</au><au>Chen, Hung-Ming</au><au>Hwang, Shiow-Fen</au><au>Ho, Shinn-Ying</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method</atitle><jtitle>BioSystems</jtitle><addtitle>Biosystems</addtitle><date>2007-09-01</date><risdate>2007</risdate><volume>90</volume><issue>2</issue><spage>405</spage><epage>413</epage><pages>405-413</pages><issn>0303-2647</issn><eissn>1872-8324</eissn><abstract>Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the
k-nearest neighbor (
k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap problem, this study proposes an efficient non-parametric classifier for predicting enzyme subfamily class using an adaptive fuzzy
r-nearest neighbor (AFK-NN) method, where
k and a fuzzy strength parameter
m are adaptively specified. The fuzzy membership values of a query sample
Q are dynamically determined according to the position of
Q and its weighted distances to the
k nearest neighbors. Using the same enzymes of the oxidoreductases family for comparisons, the prediction accuracy of AFK-NN is 76.6%, which is better than those of Support Vector Machine (73.6%), the decision tree method C5.0 (75.4%) and the existing covariant-discriminate algorithm (70.6%) using a jackknife test. To evaluate the generalization ability of AFK-NN, the datasets for all six families of entirely sequenced enzymes are established from the newly updated SWISS-PROT and ENZYME database. The accuracy of AFK-NN on the new large-scale dataset of oxidoreductases family is 83.3%, and the mean accuracy of the six families is 92.1%.</abstract><cop>Ireland</cop><pub>Elsevier Ireland Ltd</pub><pmid>17140725</pmid><doi>10.1016/j.biosystems.2006.10.004</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0303-2647 |
ispartof | BioSystems, 2007-09, Vol.90 (2), p.405-413 |
issn | 0303-2647 1872-8324 |
language | eng |
recordid | cdi_proquest_miscellaneous_68430080 |
source | ScienceDirect Freedom Collection |
subjects | Algorithms Amino acid composition Amino Acids - chemistry Animals Computer Simulation Databases, Protein Enzyme subfamily class prediction Enzymes - chemistry Fuzzy Logic Fuzzy theory Genetic Vectors k-Nearest neighbor Models, Statistical Models, Theoretical Oxidoreductases - genetics Reproducibility of Results Sequence Alignment Sequence Analysis, Protein Support vector machine Systems Biology |
title | Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T03%3A58%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accurate%20prediction%20of%20enzyme%20subfamily%20class%20using%20an%20adaptive%20fuzzy%20k-nearest%20neighbor%20method&rft.jtitle=BioSystems&rft.au=Huang,%20Wen-Lin&rft.date=2007-09-01&rft.volume=90&rft.issue=2&rft.spage=405&rft.epage=413&rft.pages=405-413&rft.issn=0303-2647&rft.eissn=1872-8324&rft_id=info:doi/10.1016/j.biosystems.2006.10.004&rft_dat=%3Cproquest_cross%3E68430080%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c372t-b6cc680d3fb0f1718dd7f8bd3e687ce8564fdfdafafaec5e762e895fab57cfa63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=68430080&rft_id=info:pmid/17140725&rfr_iscdi=true |