Loading…

A Data Stratification Process for Instances Selection Applied to Co-training Semi-supervised Learning Algorithm

Machine Learning (ML) is a field focused on developing methods and algorithms that allow machines to learn from previous data and experiences. There are several ML techniques that can be broadly divided into three main approaches: unsupervised, semi-supervised and supervised. The semi-supervised lea...

Full description

Saved in:
Bibliographic Details
Main Authors: Araujo, Yago N., Vale, Karliane M. O., Gorgonio, Flavius L., Anne Magaly de, P. Canuto, Gorgonio, Arthur Costa, da S. Barreto, Cephas A.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine Learning (ML) is a field focused on developing methods and algorithms that allow machines to learn from previous data and experiences. There are several ML techniques that can be broadly divided into three main approaches: unsupervised, semi-supervised and supervised. The semi-supervised learning has been proposed as an attempt to solve some weaknesses of both supervised and unsupervised techniques. There are several semi-supervised techniques and co-training is one of the most used methods. In its original form, the co-training algorithm selects instances with the best prediction rates without necessarily considering the class to which they belong, which can be a problem in domains where there are classes with low representativeness. This paper proposes the inclusion of a data stratification strategy in the Co-training algorithm in order to maintain the same representativeness of the classes initially labeled throughout the learning process. The Co-training algorithm proposed in this work (Co-FlexCon-CS) was adapted from the original Co-FlexCon-C method with the inclusion of a data stratification technique. This study evaluates all the variations of the Co-FlexCon-CS, using 30 datasets with different characteristics and four classifiers algorithms, comparing it to the original Co-training and the Co-FlexCon-C. Finally, a statistical analysis was performed using Friedman Test, and the critical difference diagrams were evaluated for all methods using a post-hoc Friedman Test (Nemenyi). The obtained results show that the proposed approach achieves better results than the standard Co-training, which does not use this strategy.
ISSN:2161-4407
DOI:10.1109/IJCNN52387.2021.9533688