Loading…
Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method
Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the k-nearest neighbor ( k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the...
Saved in:
Published in: | BioSystems 2007-09, Vol.90 (2), p.405-413 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the
k-nearest neighbor (
k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap problem, this study proposes an efficient non-parametric classifier for predicting enzyme subfamily class using an adaptive fuzzy
r-nearest neighbor (AFK-NN) method, where
k and a fuzzy strength parameter
m are adaptively specified. The fuzzy membership values of a query sample
Q are dynamically determined according to the position of
Q and its weighted distances to the
k nearest neighbors. Using the same enzymes of the oxidoreductases family for comparisons, the prediction accuracy of AFK-NN is 76.6%, which is better than those of Support Vector Machine (73.6%), the decision tree method C5.0 (75.4%) and the existing covariant-discriminate algorithm (70.6%) using a jackknife test. To evaluate the generalization ability of AFK-NN, the datasets for all six families of entirely sequenced enzymes are established from the newly updated SWISS-PROT and ENZYME database. The accuracy of AFK-NN on the new large-scale dataset of oxidoreductases family is 83.3%, and the mean accuracy of the six families is 92.1%. |
---|---|
ISSN: | 0303-2647 1872-8324 |
DOI: | 10.1016/j.biosystems.2006.10.004 |