Loading…

Classification and feature selection methods based on fitting logistic regression to PU data

In our work, we examine the classification methods where the positive and unlabeled data are considered and where the conditional distribution of the true class label given the feature vector is governed by the model of logistic regression. Our first objective is to compute and compare the selected...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of computational science 2023-09, Vol.72, p.102095, Article 102095
Main Authors:	Furmańczyk, Konrad, Paczutkowski, Kacper, Dudziński, Marcin, Dziewa-Dawidczyk, Diana
Format:	Article
Language:	English
Subjects:	Empirical risk minimization Logistic regression Mutual information-based feature selection Positive unlabeled learning Thresholded Lasso
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In our work, we examine the classification methods where the positive and unlabeled data are considered and where the conditional distribution of the true class label given the feature vector is governed by the model of logistic regression. Our first objective is to compute and compare the selected metrics allowing for the quality assessment of these methods. In this context, we investigate four methods of the posterior probability estimation, where the risk of logistic loss function is optimized: the naive approach, the weighted likelihood approach, as well as the quite recently proposed methods – the joint approach, and the LassoJoint method. The corresponding evaluations are basically performed for 13 machine learning models on some chosen – both low- and high-dimensional – datasets. Some of the mentioned machine learning model schemes have been directly borrowed from literature and some have been obtained through some modifications in the existing procedures. Our second goal is to establish the most stable and efficient approach for the posterior probability estimation. Moreover, we use the AdaSampling scheme for comparison of the considered classification methods. We also conducted comparisons of feature selection procedures – the Mutual Information-Based feature selection method and the LassoJoint approach. The current article is an enhancement of the conference paper Furmańczyk et al. (2022). •Metrics for PU-classifications obtained with use of the logistic model.•The joint method and the LassoJoint for low and high-dimensional real datasets.•Conducting the calibration of parameters of the LassoJoint method.•The joint method with the Mutual Information-Based Criterion.
ISSN:	1877-7503 1877-7511
DOI:	10.1016/j.jocs.2023.102095