Loading…
On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification
Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of...
Saved in:
Published in: | Neurocomputing (Amsterdam) 2014-05, Vol.132, p.30-41 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53 |
---|---|
cites | cdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53 |
container_end_page | 41 |
container_issue | |
container_start_page | 30 |
container_title | Neurocomputing (Amsterdam) |
container_volume | 132 |
creator | Triguero, Isaac Sáez, José A. Luengo, Julián García, Salvador Herrera, Francisco |
description | Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set.
Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning.
•The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better. |
doi_str_mv | 10.1016/j.neucom.2013.05.055 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1793282584</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0925231213011016</els_id><sourcerecordid>1793282584</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</originalsourceid><addsrcrecordid>eNp9UE1rGzEQFaGFuGn-QQ66BHpZR9KuvNpLoZjmAwy-JGcha0fxmF3J1awN7a-PHIcewwwMM7x5b-YxdiPFXAq5uNvNIxx8GudKyHoudEl9wWbStKoyyiy-sJnolK5ULdUl-0a0E0K2UnUzNq4jn7bA_dZl5yfI-M9NmCJPgceEBDzgUMbEQ8qcYAjVlB1GjK-lG7Giwx7ysQB7jpFHcBloKhVft5uy4QdHhAH9O-t39jW4geD6o16xl_vfz8vHarV-eFr-WlW-XnRT5XpotGjANN5IGcwGXFBdeWEjQt0avelBtKbWuoxas5Cyb5tihAnGKxd6XV-xH2fefU5_DuUgOyJ5GAYXIR3IyrarlVHaNAXanKE-J6IMwe4zji7_tVLYk7t2Z8_u2pO7VuiSJ4XbDwVH3g0hu-iR_u8W8hJSFtzPMw7Ku0eEbMkjRA89ZvCT7RN-LvQGdXeTwQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1793282584</pqid></control><display><type>article</type><title>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</title><source>ScienceDirect Journals</source><creator>Triguero, Isaac ; Sáez, José A. ; Luengo, Julián ; García, Salvador ; Herrera, Francisco</creator><creatorcontrib>Triguero, Isaac ; Sáez, José A. ; Luengo, Julián ; García, Salvador ; Herrera, Francisco</creatorcontrib><description>Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set.
Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning.
•The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better.</description><identifier>ISSN: 0925-2312</identifier><identifier>EISSN: 1872-8286</identifier><identifier>DOI: 10.1016/j.neucom.2013.05.055</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Applied sciences ; Artificial intelligence ; Classification ; Computer science; control theory; systems ; Exact sciences and technology ; Hypotheses ; Learning ; Learning and adaptive systems ; Mathematical models ; Nearest neighbor classification ; Noise ; Noise filters ; Noisy data ; Self-training ; Semi-supervised learning ; Training</subject><ispartof>Neurocomputing (Amsterdam), 2014-05, Vol.132, p.30-41</ispartof><rights>2013 Elsevier B.V.</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</citedby><cites>FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,780,784,789,790,23928,23929,25138,27922,27923</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28282811$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Triguero, Isaac</creatorcontrib><creatorcontrib>Sáez, José A.</creatorcontrib><creatorcontrib>Luengo, Julián</creatorcontrib><creatorcontrib>García, Salvador</creatorcontrib><creatorcontrib>Herrera, Francisco</creatorcontrib><title>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</title><title>Neurocomputing (Amsterdam)</title><description>Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set.
Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning.
•The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Classification</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Hypotheses</subject><subject>Learning</subject><subject>Learning and adaptive systems</subject><subject>Mathematical models</subject><subject>Nearest neighbor classification</subject><subject>Noise</subject><subject>Noise filters</subject><subject>Noisy data</subject><subject>Self-training</subject><subject>Semi-supervised learning</subject><subject>Training</subject><issn>0925-2312</issn><issn>1872-8286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9UE1rGzEQFaGFuGn-QQ66BHpZR9KuvNpLoZjmAwy-JGcha0fxmF3J1awN7a-PHIcewwwMM7x5b-YxdiPFXAq5uNvNIxx8GudKyHoudEl9wWbStKoyyiy-sJnolK5ULdUl-0a0E0K2UnUzNq4jn7bA_dZl5yfI-M9NmCJPgceEBDzgUMbEQ8qcYAjVlB1GjK-lG7Giwx7ysQB7jpFHcBloKhVft5uy4QdHhAH9O-t39jW4geD6o16xl_vfz8vHarV-eFr-WlW-XnRT5XpotGjANN5IGcwGXFBdeWEjQt0avelBtKbWuoxas5Cyb5tihAnGKxd6XV-xH2fefU5_DuUgOyJ5GAYXIR3IyrarlVHaNAXanKE-J6IMwe4zji7_tVLYk7t2Z8_u2pO7VuiSJ4XbDwVH3g0hu-iR_u8W8hJSFtzPMw7Ku0eEbMkjRA89ZvCT7RN-LvQGdXeTwQ</recordid><startdate>20140520</startdate><enddate>20140520</enddate><creator>Triguero, Isaac</creator><creator>Sáez, José A.</creator><creator>Luengo, Julián</creator><creator>García, Salvador</creator><creator>Herrera, Francisco</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20140520</creationdate><title>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</title><author>Triguero, Isaac ; Sáez, José A. ; Luengo, Julián ; García, Salvador ; Herrera, Francisco</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Classification</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Hypotheses</topic><topic>Learning</topic><topic>Learning and adaptive systems</topic><topic>Mathematical models</topic><topic>Nearest neighbor classification</topic><topic>Noise</topic><topic>Noise filters</topic><topic>Noisy data</topic><topic>Self-training</topic><topic>Semi-supervised learning</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Triguero, Isaac</creatorcontrib><creatorcontrib>Sáez, José A.</creatorcontrib><creatorcontrib>Luengo, Julián</creatorcontrib><creatorcontrib>García, Salvador</creatorcontrib><creatorcontrib>Herrera, Francisco</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Neurocomputing (Amsterdam)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Triguero, Isaac</au><au>Sáez, José A.</au><au>Luengo, Julián</au><au>García, Salvador</au><au>Herrera, Francisco</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification</atitle><jtitle>Neurocomputing (Amsterdam)</jtitle><date>2014-05-20</date><risdate>2014</risdate><volume>132</volume><spage>30</spage><epage>41</epage><pages>30-41</pages><issn>0925-2312</issn><eissn>1872-8286</eissn><abstract>Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set.
Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning.
•The filtering process is more complex in SSL due to the number of labeled examples.•Inclusion of erroneous examples in labeled data can alter inductive capabilities.•Self-training filtered finds robust learned hypotheses to predict unseen cases.•Global filters highlight the best performing family of filters in SSL.•Local approaches need more labeled data to perform better.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.neucom.2013.05.055</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0925-2312 |
ispartof | Neurocomputing (Amsterdam), 2014-05, Vol.132, p.30-41 |
issn | 0925-2312 1872-8286 |
language | eng |
recordid | cdi_proquest_miscellaneous_1793282584 |
source | ScienceDirect Journals |
subjects | Algorithms Applied sciences Artificial intelligence Classification Computer science control theory systems Exact sciences and technology Hypotheses Learning Learning and adaptive systems Mathematical models Nearest neighbor classification Noise Noise filters Noisy data Self-training Semi-supervised learning Training |
title | On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T09%3A28%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20characterization%20of%20noise%20filters%20for%20self-training%20semi-supervised%20in%20nearest%20neighbor%20classification&rft.jtitle=Neurocomputing%20(Amsterdam)&rft.au=Triguero,%20Isaac&rft.date=2014-05-20&rft.volume=132&rft.spage=30&rft.epage=41&rft.pages=30-41&rft.issn=0925-2312&rft.eissn=1872-8286&rft_id=info:doi/10.1016/j.neucom.2013.05.055&rft_dat=%3Cproquest_cross%3E1793282584%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c369t-ade4504e84c811f8beaf29092b0f3785bde07835509278611d741018f8c2afd53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1793282584&rft_id=info:pmid/&rfr_iscdi=true |