Loading…

Racing trees to query partial data

Dealing with partially known or missing data is a common problem in machine learning. This work is interested in the problem of querying the true value of data to improve the quality of the learned model, when those data are only partially known. This study is in the line of active learning, since w...

Full description

Saved in:
Bibliographic Details
Published in:Soft computing (Berlin, Germany) Germany), 2021-07, Vol.25 (14), p.9285-9305
Main Authors: Nguyen, Vu-Linh, Destercke, Sébastien, Masson, Marie-Hélène, Ghassani, Rashad
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Dealing with partially known or missing data is a common problem in machine learning. This work is interested in the problem of querying the true value of data to improve the quality of the learned model, when those data are only partially known. This study is in the line of active learning, since we consider that the precise value of some partial data can be queried to reduce the uncertainty in the learning process, yet can consider any kind of partial data (not only entirely missing one). We propose a querying strategy based on the concept of racing algorithms in which several models are competing. The idea is to identify the query that will help the most to quickly decide the winning model in the competition. After discussing and formalizing the general ideas of our approach, we study the particular case of decision trees in case of interval-valued features and set-valued labels. The experimental results indicate that, in comparison with other baselines, the proposed approach significantly outperforms simpler strategies in the case of partially specified features, while it achieves similar performances in the case of partially specified labels.
ISSN:1432-7643
1433-7479
DOI:10.1007/s00500-021-05872-5