Loading…
On removing conflicts for machine learning
A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values...
Saved in:
Published in: | Expert systems with applications 2022-11, Vol.206, p.117835, Article 117835 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset.
•Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2022.117835 |