Loading…

A comparison of ℓ1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression

Relevant information extraction and dimensionality reduction of the original input features is an interesting research area in machine learning and data analysis. Logistic regression (LR) is a well-known classification method that has been used widely in many applications of data mining, machine lea...

Full description

Saved in:
Bibliographic Details
Published in:International journal of machine learning and cybernetics 2014-12, Vol.5 (6), p.861-873
Main Author: Musa, Abdallah Bashir
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Relevant information extraction and dimensionality reduction of the original input features is an interesting research area in machine learning and data analysis. Logistic regression (LR) is a well-known classification method that has been used widely in many applications of data mining, machine learning, and bioinformatics. However, its performance is affected by the multi-co-linearity among its predictors, and the features’ redundancy. ℓ1-regularizion and features extraction methods are commonly used to enhance the performance of logistic regression under multi-co-linearity and ovefitting problems, and to reduce computational complexity by discarding less relevant or redundant features. These methods include principal component analysis, kernel principal component analysis and independent component analysis. Recently, ℓ1-regularized logistic regression has received much attention as a promising method for features selection in classification tasks. So there is a great need to be compared with these existing methods. In this paper, we assess the performance of the aforementioned feature selection methods on LR and ℓ1-regularized logistic regression using different statistical measures. A variety of performance metrics has been utilized: accuracy, sensitivity, specificity, precision, the area under receiver operating characteristic curve and the receiver operating characteristic analysis. This study is distinct by its inclusion of a comprehensive statistical analysis.
ISSN:1868-8071
1868-808X
DOI:10.1007/s13042-013-0171-7