Loading…
A comparison of ℓ1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression
Relevant information extraction and dimensionality reduction of the original input features is an interesting research area in machine learning and data analysis. Logistic regression (LR) is a well-known classification method that has been used widely in many applications of data mining, machine lea...
Saved in:
Published in: | International journal of machine learning and cybernetics 2014-12, Vol.5 (6), p.861-873 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Relevant information extraction and dimensionality reduction of the original input features is an interesting research area in machine learning and data analysis. Logistic regression (LR) is a well-known classification method that has been used widely in many applications of data mining, machine learning, and bioinformatics. However, its performance is affected by the multi-co-linearity among its predictors, and the features’ redundancy. ℓ1-regularizion and features extraction methods are commonly used to enhance the performance of logistic regression under multi-co-linearity and ovefitting problems, and to reduce computational complexity by discarding less relevant or redundant features. These methods include principal component analysis, kernel principal component analysis and independent component analysis. Recently, ℓ1-regularized logistic regression has received much attention as a promising method for features selection in classification tasks. So there is a great need to be compared with these existing methods. In this paper, we assess the performance of the aforementioned feature selection methods on LR and ℓ1-regularized logistic regression using different statistical measures. A variety of performance metrics has been utilized: accuracy, sensitivity, specificity, precision, the area under receiver operating characteristic curve and the receiver operating characteristic analysis. This study is distinct by its inclusion of a comprehensive statistical analysis. |
---|---|
ISSN: | 1868-8071 1868-808X |
DOI: | 10.1007/s13042-013-0171-7 |