Loading…

On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) 2014-05, Vol.132, p.30-41
Main Authors:	Triguero, Isaac, Sáez, José A., Luengo, Julián, García, Salvador, Herrera, Francisco
Format:	Article
Language:	English
Subjects:	Algorithms Applied sciences Artificial intelligence Classification Computer science control theory systems Exact sciences and technology Hypotheses Learning Learning and adaptive systems Mathematical models Nearest neighbor classification Noise Noise filters Noisy data Self-training Semi-supervised learning Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53
cites	cdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53
container_end_page	41
container_issue
container_start_page	30
container_title	Neurocomputing (Amsterdam)
container_volume	132
creator	Triguero, Isaac Sáez, José A. Luengo, Julián García, Salvador Herrera, Francisco
description	Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set. Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning. •The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better.
doi_str_mv	10.1016/j.neucom.2013.05.055
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1793282584</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0925231213011016</els_id><sourcerecordid>1793282584</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</originalsourceid><addsrcrecordid>eNp9UE1rGzEQFaGFuGn-QQ66BHpZR9KuvNpLoZjmAwy-JGcha0fxmF3J1awN7a-PHIcewwwMM7x5b-YxdiPFXAq5uNvNIxx8GudKyHoudEl9wWbStKoyyiy-sJnolK5ULdUl-0a0E0K2UnUzNq4jn7bA_dZl5yfI-M9NmCJPgceEBDzgUMbEQ8qcYAjVlB1GjK-lG7Giwx7ysQB7jpFHcBloKhVft5uy4QdHhAH9O-t39jW4geD6o16xl_vfz8vHarV-eFr-WlW-XnRT5XpotGjANN5IGcwGXFBdeWEjQt0avelBtKbWuoxas5Cyb5tihAnGKxd6XV-xH2fefU5_DuUgOyJ5GAYXIR3IyrarlVHaNAXanKE-J6IMwe4zji7_tVLYk7t2Z8_u2pO7VuiSJ4XbDwVH3g0hu-iR_u8W8hJSFtzPMw7Ku0eEbMkjRA89ZvCT7RN-LvQGdXeTwQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1793282584</pqid></control><display><type>article</type><title>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</title><source>ScienceDirect Journals</source><creator>Triguero, Isaac ; Sáez, José A. ; Luengo, Julián ; García, Salvador ; Herrera, Francisco</creator><creatorcontrib>Triguero, Isaac ; Sáez, José A. ; Luengo, Julián ; García, Salvador ; Herrera, Francisco</creatorcontrib><description>Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set. Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning. •The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better.</description><identifier>ISSN: 0925-2312</identifier><identifier>EISSN: 1872-8286</identifier><identifier>DOI: 10.1016/j.neucom.2013.05.055</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Applied sciences ; Artificial intelligence ; Classification ; Computer science; control theory; systems ; Exact sciences and technology ; Hypotheses ; Learning ; Learning and adaptive systems ; Mathematical models ; Nearest neighbor classification ; Noise ; Noise filters ; Noisy data ; Self-training ; Semi-supervised learning ; Training</subject><ispartof>Neurocomputing (Amsterdam), 2014-05, Vol.132, p.30-41</ispartof><rights>2013 Elsevier B.V.</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</citedby><cites>FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,780,784,789,790,23928,23929,25138,27922,27923</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28282811$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Triguero, Isaac</creatorcontrib><creatorcontrib>Sáez, José A.</creatorcontrib><creatorcontrib>Luengo, Julián</creatorcontrib><creatorcontrib>García, Salvador</creatorcontrib><creatorcontrib>Herrera, Francisco</creatorcontrib><title>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</title><title>Neurocomputing (Amsterdam)</title><description>Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set. Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning. •The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Classification</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Hypotheses</subject><subject>Learning</subject><subject>Learning and adaptive systems</subject><subject>Mathematical models</subject><subject>Nearest neighbor classification</subject><subject>Noise</subject><subject>Noise filters</subject><subject>Noisy data</subject><subject>Self-training</subject><subject>Semi-supervised learning</subject><subject>Training</subject><issn>0925-2312</issn><issn>1872-8286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9UE1rGzEQFaGFuGn-QQ66BHpZR9KuvNpLoZjmAwy-JGcha0fxmF3J1awN7a-PHIcewwwMM7x5b-YxdiPFXAq5uNvNIxx8GudKyHoudEl9wWbStKoyyiy-sJnolK5ULdUl-0a0E0K2UnUzNq4jn7bA_dZl5yfI-M9NmCJPgceEBDzgUMbEQ8qcYAjVlB1GjK-lG7Giwx7ysQB7jpFHcBloKhVft5uy4QdHhAH9O-t39jW4geD6o16xl_vfz8vHarV-eFr-WlW-XnRT5XpotGjANN5IGcwGXFBdeWEjQt0avelBtKbWuoxas5Cyb5tihAnGKxd6XV-xH2fefU5_DuUgOyJ5GAYXIR3IyrarlVHaNAXanKE-J6IMwe4zji7_tVLYk7t2Z8_u2pO7VuiSJ4XbDwVH3g0hu-iR_u8W8hJSFtzPMw7Ku0eEbMkjRA89ZvCT7RN-LvQGdXeTwQ</recordid><startdate>20140520</startdate><enddate>20140520</enddate><creator>Triguero, Isaac</creator><creator>Sáez, José A.</creator><creator>Luengo, Julián</creator><creator>García, Salvador</creator><creator>Herrera, Francisco</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20140520</creationdate><title>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</title><author>Triguero, Isaac ; Sáez, José A. ; Luengo, Julián ; García, Salvador ; Herrera, Francisco</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Classification</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Hypotheses</topic><topic>Learning</topic><topic>Learning and adaptive systems</topic><topic>Mathematical models</topic><topic>Nearest neighbor classification</topic><topic>Noise</topic><topic>Noise filters</topic><topic>Noisy data</topic><topic>Self-training</topic><topic>Semi-supervised learning</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Triguero, Isaac</creatorcontrib><creatorcontrib>Sáez, José A.</creatorcontrib><creatorcontrib>Luengo, Julián</creatorcontrib><creatorcontrib>García, Salvador</creatorcontrib><creatorcontrib>Herrera, Francisco</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Neurocomputing (Amsterdam)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Triguero, Isaac</au><au>Sáez, José A.</au><au>Luengo, Julián</au><au>García, Salvador</au><au>Herrera, Francisco</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</atitle><jtitle>Neurocomputing (Amsterdam)</jtitle><date>2014-05-20</date><risdate>2014</risdate><volume>132</volume><spage>30</spage><epage>41</epage><pages>30-41</pages><issn>0925-2312</issn><eissn>1872-8286</eissn><abstract>Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set. Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning. •The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.neucom.2013.05.055</doi><tpages>12</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0925-2312
ispartof	Neurocomputing (Amsterdam), 2014-05, Vol.132, p.30-41
issn	0925-2312 1872-8286
language	eng
recordid	cdi_proquest_miscellaneous_1793282584
source	ScienceDirect Journals
subjects	Algorithms Applied sciences Artificial intelligence Classification Computer science control theory systems Exact sciences and technology Hypotheses Learning Learning and adaptive systems Mathematical models Nearest neighbor classification Noise Noise filters Noisy data Self-training Semi-supervised learning Training
title	On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T09%3A28%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20characterization%20of%20noise%20filters%20for%20self-training%20semi-supervised%20in%20nearest%20neighbor%20classification&rft.jtitle=Neurocomputing%20(Amsterdam)&rft.au=Triguero,%20Isaac&rft.date=2014-05-20&rft.volume=132&rft.spage=30&rft.epage=41&rft.pages=30-41&rft.issn=0925-2312&rft.eissn=1872-8286&rft_id=info:doi/10.1016/j.neucom.2013.05.055&rft_dat=%3Cproquest_cross%3E1793282584%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1793282584&rft_id=info:pmid/&rfr_iscdi=true