Loading…
Classification and feature selection methods based on fitting logistic regression to PU data
In our work, we examine the classification methods where the positive and unlabeled data are considered and where the conditional distribution of the true class label given the feature vector is governed by the model of logistic regression. Our first objective is to compute and compare the selected...
Saved in:
Published in: | Journal of computational science 2023-09, Vol.72, p.102095, Article 102095 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In our work, we examine the classification methods where the positive and unlabeled data are considered and where the conditional distribution of the true class label given the feature vector is governed by the model of logistic regression. Our first objective is to compute and compare the selected metrics allowing for the quality assessment of these methods. In this context, we investigate four methods of the posterior probability estimation, where the risk of logistic loss function is optimized: the naive approach, the weighted likelihood approach, as well as the quite recently proposed methods – the joint approach, and the LassoJoint method. The corresponding evaluations are basically performed for 13 machine learning models on some chosen – both low- and high-dimensional – datasets. Some of the mentioned machine learning model schemes have been directly borrowed from literature and some have been obtained through some modifications in the existing procedures. Our second goal is to establish the most stable and efficient approach for the posterior probability estimation. Moreover, we use the AdaSampling scheme for comparison of the considered classification methods. We also conducted comparisons of feature selection procedures – the Mutual Information-Based feature selection method and the LassoJoint approach. The current article is an enhancement of the conference paper Furmańczyk et al. (2022).
•Metrics for PU-classifications obtained with use of the logistic model.•The joint method and the LassoJoint for low and high-dimensional real datasets.•Conducting the calibration of parameters of the LassoJoint method.•The joint method with the Mutual Information-Based Criterion. |
---|---|
ISSN: | 1877-7503 1877-7511 |
DOI: | 10.1016/j.jocs.2023.102095 |