Loading…

On removing conflicts for machine learning

A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values...

Full description

Saved in:

Bibliographic Details
Published in:	Expert systems with applications 2022-11, Vol.206, p.117835, Article 117835
Main Authors:	Ledesma, Sergio, Ibarra-Manzano, Mario-Alberto, Almanza-Ojeda, Dora-Luz, Avina-Cervantes, Juan Gabriel, Cabal-Yepez, Eduardo
Format:	Article
Language:	English
Subjects:	Artificial neural networks Genetic algorithm Learning conflicts Machine learning Training issues
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3
cites	cdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3
container_end_page
container_issue
container_start_page	117835
container_title	Expert systems with applications
container_volume	206
creator	Ledesma, Sergio Ibarra-Manzano, Mario-Alberto Almanza-Ojeda, Dora-Luz Avina-Cervantes, Juan Gabriel Cabal-Yepez, Eduardo
description	A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.
doi_str_mv	10.1016/j.eswa.2022.117835
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_eswa_2022_117835</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417422010934</els_id><sourcerecordid>S0957417422010934</sourcerecordid><originalsourceid>FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</originalsourceid><addsrcrecordid>eNp9j01LxDAURYMoWEf_gKuuhdb3kqZpwY0MOgoDs9F1SNMXTemHJGXEf2-HunZ1F5dzuYexW4QcAcv7Lqf4bXIOnOeIqhLyjCVYKZGVqhbnLIFaqqxAVVyyqxg7AFQAKmF3hzENNExHP36kdhpd7-0cUzeFdDD204-U9mTCuNTX7MKZPtLNX27Y-_PT2_Yl2x92r9vHfWYFwJwJrCWvsaytcU1pJLaFaAupmqJyaIGjkFIoR1wVFtqmbIyr0KDinBqu6kZsGF93bZhiDOT0V_CDCT8aQZ9sdadPtvpkq1fbBXpYIVqeHT0FHa2n0VLrA9lZt5P_D_8FWpJcsg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On removing conflicts for machine learning</title><source>Elsevier</source><creator>Ledesma, Sergio ; Ibarra-Manzano, Mario-Alberto ; Almanza-Ojeda, Dora-Luz ; Avina-Cervantes, Juan Gabriel ; Cabal-Yepez, Eduardo</creator><creatorcontrib>Ledesma, Sergio ; Ibarra-Manzano, Mario-Alberto ; Almanza-Ojeda, Dora-Luz ; Avina-Cervantes, Juan Gabriel ; Cabal-Yepez, Eduardo</creatorcontrib><description>A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2022.117835</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Artificial neural networks ; Genetic algorithm ; Learning conflicts ; Machine learning ; Training issues</subject><ispartof>Expert systems with applications, 2022-11, Vol.206, p.117835, Article 117835</ispartof><rights>2022 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</citedby><cites>FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</cites><orcidid>0000-0001-8411-8740 ; 0000-0002-3373-0929 ; 0000-0003-4317-0248 ; 0000-0003-1730-3748 ; 0000-0001-6903-4434</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Ledesma, Sergio</creatorcontrib><creatorcontrib>Ibarra-Manzano, Mario-Alberto</creatorcontrib><creatorcontrib>Almanza-Ojeda, Dora-Luz</creatorcontrib><creatorcontrib>Avina-Cervantes, Juan Gabriel</creatorcontrib><creatorcontrib>Cabal-Yepez, Eduardo</creatorcontrib><title>On removing conflicts for machine learning</title><title>Expert systems with applications</title><description>A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.</description><subject>Artificial neural networks</subject><subject>Genetic algorithm</subject><subject>Learning conflicts</subject><subject>Machine learning</subject><subject>Training issues</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9j01LxDAURYMoWEf_gKuuhdb3kqZpwY0MOgoDs9F1SNMXTemHJGXEf2-HunZ1F5dzuYexW4QcAcv7Lqf4bXIOnOeIqhLyjCVYKZGVqhbnLIFaqqxAVVyyqxg7AFQAKmF3hzENNExHP36kdhpd7-0cUzeFdDD204-U9mTCuNTX7MKZPtLNX27Y-_PT2_Yl2x92r9vHfWYFwJwJrCWvsaytcU1pJLaFaAupmqJyaIGjkFIoR1wVFtqmbIyr0KDinBqu6kZsGF93bZhiDOT0V_CDCT8aQZ9sdadPtvpkq1fbBXpYIVqeHT0FHa2n0VLrA9lZt5P_D_8FWpJcsg</recordid><startdate>20221115</startdate><enddate>20221115</enddate><creator>Ledesma, Sergio</creator><creator>Ibarra-Manzano, Mario-Alberto</creator><creator>Almanza-Ojeda, Dora-Luz</creator><creator>Avina-Cervantes, Juan Gabriel</creator><creator>Cabal-Yepez, Eduardo</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-8411-8740</orcidid><orcidid>https://orcid.org/0000-0002-3373-0929</orcidid><orcidid>https://orcid.org/0000-0003-4317-0248</orcidid><orcidid>https://orcid.org/0000-0003-1730-3748</orcidid><orcidid>https://orcid.org/0000-0001-6903-4434</orcidid></search><sort><creationdate>20221115</creationdate><title>On removing conflicts for machine learning</title><author>Ledesma, Sergio ; Ibarra-Manzano, Mario-Alberto ; Almanza-Ojeda, Dora-Luz ; Avina-Cervantes, Juan Gabriel ; Cabal-Yepez, Eduardo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Genetic algorithm</topic><topic>Learning conflicts</topic><topic>Machine learning</topic><topic>Training issues</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ledesma, Sergio</creatorcontrib><creatorcontrib>Ibarra-Manzano, Mario-Alberto</creatorcontrib><creatorcontrib>Almanza-Ojeda, Dora-Luz</creatorcontrib><creatorcontrib>Avina-Cervantes, Juan Gabriel</creatorcontrib><creatorcontrib>Cabal-Yepez, Eduardo</creatorcontrib><collection>CrossRef</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ledesma, Sergio</au><au>Ibarra-Manzano, Mario-Alberto</au><au>Almanza-Ojeda, Dora-Luz</au><au>Avina-Cervantes, Juan Gabriel</au><au>Cabal-Yepez, Eduardo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On removing conflicts for machine learning</atitle><jtitle>Expert systems with applications</jtitle><date>2022-11-15</date><risdate>2022</risdate><volume>206</volume><spage>117835</spage><pages>117835-</pages><artnum>117835</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset. •Learning conflicts that affect machine learning system performance.•Samples in a dataset with similar values but different target.•Retention error kept meaningful samples while removing learning conflicts.•Artificial neural networks trained with learning conflicts datasets.•Generic algorithm improves the machine learning system performance.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2022.117835</doi><orcidid>https://orcid.org/0000-0001-8411-8740</orcidid><orcidid>https://orcid.org/0000-0002-3373-0929</orcidid><orcidid>https://orcid.org/0000-0003-4317-0248</orcidid><orcidid>https://orcid.org/0000-0003-1730-3748</orcidid><orcidid>https://orcid.org/0000-0001-6903-4434</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2022-11, Vol.206, p.117835, Article 117835
issn	0957-4174 1873-6793
language	eng
recordid	cdi_crossref_primary_10_1016_j_eswa_2022_117835
source	Elsevier
subjects	Artificial neural networks Genetic algorithm Learning conflicts Machine learning Training issues
title	On removing conflicts for machine learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A32%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20removing%20conflicts%20for%20machine%20learning&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Ledesma,%20Sergio&rft.date=2022-11-15&rft.volume=206&rft.spage=117835&rft.pages=117835-&rft.artnum=117835&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2022.117835&rft_dat=%3Celsevier_cross%3ES0957417422010934%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c300t-319529169cafb6a51d43d457b48f1c02135537fe274c0db6baf81a1722eb279b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true