Loading…
NeuCrowd: neural sampling network for representation learning with crowdsourced labels
Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget lim...
Saved in:
Published in: | Knowledge and information systems 2022-04, Vol.64 (4), p.995-1012 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03 |
container_end_page | 1012 |
container_issue | 4 |
container_start_page | 995 |
container_title | Knowledge and information systems |
container_volume | 64 |
creator | Hao, Yang Ding, Wenbiao Liu, Zitao |
description | Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose
NeuCrowd
, a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality
n
-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at
https://github.com/tal-ai/NeuCrowd_KAIS2021
. |
doi_str_mv | 10.1007/s10115-021-01644-7 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2646394189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2646394189</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-AU8Fz9VMkiZbb1L8B4te1GuIzWTt2k1r0rL47c3aBW-eZgbee_P4EXIO9BIoVVcRKECRUwY5BSlErg7ILF1lzgHk4X4HrtQxOYlxTSkoCTAjb084VqHb2uvM4xhMm0Wz6dvGr9I9bLvwmbkuZAH7gBH9YIam81mLJvidZtsMH1m988duDDXarDXv2MZTcuRMG_FsP-fk9e72pXrIl8_3j9XNMq-ZokNuAZHysrAclTWlqIXismC1RAWMgzQIvBZOOVQL6gQrHDIunDXWCI6G8jm5mHL70H2NGAe9Tj18eqmZFJKXAhZlUrFJlZrGGNDpPjQbE741UL3jpyd-OlHSv_y0SiY-mWIS-xWGv-h_XD9wBXRe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2646394189</pqid></control><display><type>article</type><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><source>ABI/INFORM Global</source><source>Springer Link</source><creator>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</creator><creatorcontrib>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</creatorcontrib><description>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose
NeuCrowd
, a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality
n
-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at
https://github.com/tal-ai/NeuCrowd_KAIS2021
.</description><identifier>ISSN: 0219-1377</identifier><identifier>EISSN: 0219-3116</identifier><identifier>DOI: 10.1007/s10115-021-01644-7</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Computer Science ; Crowdsourcing ; Data Mining and Knowledge Discovery ; Database Management ; Information Storage and Retrieval ; Information Systems and Communication Service ; Information Systems Applications (incl.Internet) ; IT in Business ; Labels ; Machine learning ; Regular Paper ; Representations ; Sampling ; Training</subject><ispartof>Knowledge and information systems, 2022-04, Vol.64 (4), p.995-1012</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</cites><orcidid>0000-0003-0491-307X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2646394189/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2646394189?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>Hao, Yang</creatorcontrib><creatorcontrib>Ding, Wenbiao</creatorcontrib><creatorcontrib>Liu, Zitao</creatorcontrib><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><title>Knowledge and information systems</title><addtitle>Knowl Inf Syst</addtitle><description>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose
NeuCrowd
, a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality
n
-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at
https://github.com/tal-ai/NeuCrowd_KAIS2021
.</description><subject>Algorithms</subject><subject>Computer Science</subject><subject>Crowdsourcing</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Database Management</subject><subject>Information Storage and Retrieval</subject><subject>Information Systems and Communication Service</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>IT in Business</subject><subject>Labels</subject><subject>Machine learning</subject><subject>Regular Paper</subject><subject>Representations</subject><subject>Sampling</subject><subject>Training</subject><issn>0219-1377</issn><issn>0219-3116</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kE9LxDAQxYMouK5-AU8Fz9VMkiZbb1L8B4te1GuIzWTt2k1r0rL47c3aBW-eZgbee_P4EXIO9BIoVVcRKECRUwY5BSlErg7ILF1lzgHk4X4HrtQxOYlxTSkoCTAjb084VqHb2uvM4xhMm0Wz6dvGr9I9bLvwmbkuZAH7gBH9YIam81mLJvidZtsMH1m988duDDXarDXv2MZTcuRMG_FsP-fk9e72pXrIl8_3j9XNMq-ZokNuAZHysrAclTWlqIXismC1RAWMgzQIvBZOOVQL6gQrHDIunDXWCI6G8jm5mHL70H2NGAe9Tj18eqmZFJKXAhZlUrFJlZrGGNDpPjQbE741UL3jpyd-OlHSv_y0SiY-mWIS-xWGv-h_XD9wBXRe</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Hao, Yang</creator><creator>Ding, Wenbiao</creator><creator>Liu, Zitao</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-0491-307X</orcidid></search><sort><creationdate>20220401</creationdate><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><author>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Computer Science</topic><topic>Crowdsourcing</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Database Management</topic><topic>Information Storage and Retrieval</topic><topic>Information Systems and Communication Service</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>IT in Business</topic><topic>Labels</topic><topic>Machine learning</topic><topic>Regular Paper</topic><topic>Representations</topic><topic>Sampling</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hao, Yang</creatorcontrib><creatorcontrib>Ding, Wenbiao</creatorcontrib><creatorcontrib>Liu, Zitao</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Complete</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer science database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Knowledge and information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hao, Yang</au><au>Ding, Wenbiao</au><au>Liu, Zitao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</atitle><jtitle>Knowledge and information systems</jtitle><stitle>Knowl Inf Syst</stitle><date>2022-04-01</date><risdate>2022</risdate><volume>64</volume><issue>4</issue><spage>995</spage><epage>1012</epage><pages>995-1012</pages><issn>0219-1377</issn><eissn>0219-3116</eissn><abstract>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose
NeuCrowd
, a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality
n
-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at
https://github.com/tal-ai/NeuCrowd_KAIS2021
.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s10115-021-01644-7</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0003-0491-307X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0219-1377 |
ispartof | Knowledge and information systems, 2022-04, Vol.64 (4), p.995-1012 |
issn | 0219-1377 0219-3116 |
language | eng |
recordid | cdi_proquest_journals_2646394189 |
source | ABI/INFORM Global; Springer Link |
subjects | Algorithms Computer Science Crowdsourcing Data Mining and Knowledge Discovery Database Management Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Labels Machine learning Regular Paper Representations Sampling Training |
title | NeuCrowd: neural sampling network for representation learning with crowdsourced labels |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T11%3A01%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NeuCrowd:%20neural%20sampling%20network%20for%20representation%20learning%20with%20crowdsourced%20labels&rft.jtitle=Knowledge%20and%20information%20systems&rft.au=Hao,%20Yang&rft.date=2022-04-01&rft.volume=64&rft.issue=4&rft.spage=995&rft.epage=1012&rft.pages=995-1012&rft.issn=0219-1377&rft.eissn=0219-3116&rft_id=info:doi/10.1007/s10115-021-01644-7&rft_dat=%3Cproquest_cross%3E2646394189%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2646394189&rft_id=info:pmid/&rfr_iscdi=true |