Loading…

On removing conflicts for machine learning

A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2022-11, Vol.206, p.117835, Article 117835
Main Authors: Ledesma, Sergio, Ibarra-Manzano, Mario-Alberto, Almanza-Ojeda, Dora-Luz, Avina-Cervantes, Juan Gabriel, Cabal-Yepez, Eduardo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3
cites cdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3
container_end_page
container_issue
container_start_page 117835
container_title Expert systems with applications
container_volume 206
creator Ledesma, Sergio
Ibarra-Manzano, Mario-Alberto
Almanza-Ojeda, Dora-Luz
Avina-Cervantes, Juan Gabriel
Cabal-Yepez, Eduardo
description A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.
doi_str_mv 10.1016/j.eswa.2022.117835
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_eswa_2022_117835</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417422010934</els_id><sourcerecordid>S0957417422010934</sourcerecordid><originalsourceid>FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</originalsourceid><addsrcrecordid>eNp9j01LxDAURYMoWEf_gKuuhdb3kqZpwY0MOgoDs9F1SNMXTemHJGXEf2-HunZ1F5dzuYexW4QcAcv7Lqf4bXIOnOeIqhLyjCVYKZGVqhbnLIFaqqxAVVyyqxg7AFQAKmF3hzENNExHP36kdhpd7-0cUzeFdDD204-U9mTCuNTX7MKZPtLNX27Y-_PT2_Yl2x92r9vHfWYFwJwJrCWvsaytcU1pJLaFaAupmqJyaIGjkFIoR1wVFtqmbIyr0KDinBqu6kZsGF93bZhiDOT0V_CDCT8aQZ9sdadPtvpkq1fbBXpYIVqeHT0FHa2n0VLrA9lZt5P_D_8FWpJcsg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On removing conflicts for machine learning</title><source>Elsevier</source><creator>Ledesma, Sergio ; Ibarra-Manzano, Mario-Alberto ; Almanza-Ojeda, Dora-Luz ; Avina-Cervantes, Juan Gabriel ; Cabal-Yepez, Eduardo</creator><creatorcontrib>Ledesma, Sergio ; Ibarra-Manzano, Mario-Alberto ; Almanza-Ojeda, Dora-Luz ; Avina-Cervantes, Juan Gabriel ; Cabal-Yepez, Eduardo</creatorcontrib><description>A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2022.117835</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Artificial neural networks ; Genetic algorithm ; Learning conflicts ; Machine learning ; Training issues</subject><ispartof>Expert systems with applications, 2022-11, Vol.206, p.117835, Article 117835</ispartof><rights>2022 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</citedby><cites>FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</cites><orcidid>0000-0001-8411-8740 ; 0000-0002-3373-0929 ; 0000-0003-4317-0248 ; 0000-0003-1730-3748 ; 0000-0001-6903-4434</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Ledesma, Sergio</creatorcontrib><creatorcontrib>Ibarra-Manzano, Mario-Alberto</creatorcontrib><creatorcontrib>Almanza-Ojeda, Dora-Luz</creatorcontrib><creatorcontrib>Avina-Cervantes, Juan Gabriel</creatorcontrib><creatorcontrib>Cabal-Yepez, Eduardo</creatorcontrib><title>On removing conflicts for machine learning</title><title>Expert systems with applications</title><description>A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.</description><subject>Artificial neural networks</subject><subject>Genetic algorithm</subject><subject>Learning conflicts</subject><subject>Machine learning</subject><subject>Training issues</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9j01LxDAURYMoWEf_gKuuhdb3kqZpwY0MOgoDs9F1SNMXTemHJGXEf2-HunZ1F5dzuYexW4QcAcv7Lqf4bXIOnOeIqhLyjCVYKZGVqhbnLIFaqqxAVVyyqxg7AFQAKmF3hzENNExHP36kdhpd7-0cUzeFdDD204-U9mTCuNTX7MKZPtLNX27Y-_PT2_Yl2x92r9vHfWYFwJwJrCWvsaytcU1pJLaFaAupmqJyaIGjkFIoR1wVFtqmbIyr0KDinBqu6kZsGF93bZhiDOT0V_CDCT8aQZ9sdadPtvpkq1fbBXpYIVqeHT0FHa2n0VLrA9lZt5P_D_8FWpJcsg</recordid><startdate>20221115</startdate><enddate>20221115</enddate><creator>Ledesma, Sergio</creator><creator>Ibarra-Manzano, Mario-Alberto</creator><creator>Almanza-Ojeda, Dora-Luz</creator><creator>Avina-Cervantes, Juan Gabriel</creator><creator>Cabal-Yepez, Eduardo</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-8411-8740</orcidid><orcidid>https://orcid.org/0000-0002-3373-0929</orcidid><orcidid>https://orcid.org/0000-0003-4317-0248</orcidid><orcidid>https://orcid.org/0000-0003-1730-3748</orcidid><orcidid>https://orcid.org/0000-0001-6903-4434</orcidid></search><sort><creationdate>20221115</creationdate><title>On removing conflicts for machine learning</title><author>Ledesma, Sergio ; Ibarra-Manzano, Mario-Alberto ; Almanza-Ojeda, Dora-Luz ; Avina-Cervantes, Juan Gabriel ; Cabal-Yepez, Eduardo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Genetic algorithm</topic><topic>Learning conflicts</topic><topic>Machine learning</topic><topic>Training issues</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ledesma, Sergio</creatorcontrib><creatorcontrib>Ibarra-Manzano, Mario-Alberto</creatorcontrib><creatorcontrib>Almanza-Ojeda, Dora-Luz</creatorcontrib><creatorcontrib>Avina-Cervantes, Juan Gabriel</creatorcontrib><creatorcontrib>Cabal-Yepez, Eduardo</creatorcontrib><collection>CrossRef</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ledesma, Sergio</au><au>Ibarra-Manzano, Mario-Alberto</au><au>Almanza-Ojeda, Dora-Luz</au><au>Avina-Cervantes, Juan Gabriel</au><au>Cabal-Yepez, Eduardo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On removing conflicts for machine learning</atitle><jtitle>Expert systems with applications</jtitle><date>2022-11-15</date><risdate>2022</risdate><volume>206</volume><spage>117835</spage><pages>117835-</pages><artnum>117835</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2022.117835</doi><orcidid>https://orcid.org/0000-0001-8411-8740</orcidid><orcidid>https://orcid.org/0000-0002-3373-0929</orcidid><orcidid>https://orcid.org/0000-0003-4317-0248</orcidid><orcidid>https://orcid.org/0000-0003-1730-3748</orcidid><orcidid>https://orcid.org/0000-0001-6903-4434</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2022-11, Vol.206, p.117835, Article 117835
issn 0957-4174
1873-6793
language eng
recordid cdi_crossref_primary_10_1016_j_eswa_2022_117835
source Elsevier
subjects Artificial neural networks
Genetic algorithm
Learning conflicts
Machine learning
Training issues
title On removing conflicts for machine learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A32%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20removing%20conflicts%20for%20machine%20learning&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Ledesma,%20Sergio&rft.date=2022-11-15&rft.volume=206&rft.spage=117835&rft.pages=117835-&rft.artnum=117835&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2022.117835&rft_dat=%3Celsevier_cross%3ES0957417422010934%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true