Loading…

NeuCrowd: neural sampling network for representation learning with crowdsourced labels

Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget lim...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge and information systems 2022-04, Vol.64 (4), p.995-1012
Main Authors:	Hao, Yang, Ding, Wenbiao, Liu, Zitao
Format:	Article
Language:	English
Subjects:	Algorithms Computer Science Crowdsourcing Data Mining and Knowledge Discovery Database Management Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Labels Machine learning Regular Paper Representations Sampling Training
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03
container_end_page	1012
container_issue	4
container_start_page	995
container_title	Knowledge and information systems
container_volume	64
creator	Hao, Yang Ding, Wenbiao Liu, Zitao
description	Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .
doi_str_mv	10.1007/s10115-021-01644-7
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2646394189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2646394189</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-AU8Fz9VMkiZbb1L8B4te1GuIzWTt2k1r0rL47c3aBW-eZgbee_P4EXIO9BIoVVcRKECRUwY5BSlErg7ILF1lzgHk4X4HrtQxOYlxTSkoCTAjb084VqHb2uvM4xhMm0Wz6dvGr9I9bLvwmbkuZAH7gBH9YIam81mLJvidZtsMH1m988duDDXarDXv2MZTcuRMG_FsP-fk9e72pXrIl8_3j9XNMq-ZokNuAZHysrAclTWlqIXismC1RAWMgzQIvBZOOVQL6gQrHDIunDXWCI6G8jm5mHL70H2NGAe9Tj18eqmZFJKXAhZlUrFJlZrGGNDpPjQbE741UL3jpyd-OlHSv_y0SiY-mWIS-xWGv-h_XD9wBXRe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2646394189</pqid></control><display><type>article</type><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><source>ABI/INFORM Global</source><source>Springer Link</source><creator>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</creator><creatorcontrib>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</creatorcontrib><description>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .</description><identifier>ISSN: 0219-1377</identifier><identifier>EISSN: 0219-3116</identifier><identifier>DOI: 10.1007/s10115-021-01644-7</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Computer Science ; Crowdsourcing ; Data Mining and Knowledge Discovery ; Database Management ; Information Storage and Retrieval ; Information Systems and Communication Service ; Information Systems Applications (incl.Internet) ; IT in Business ; Labels ; Machine learning ; Regular Paper ; Representations ; Sampling ; Training</subject><ispartof>Knowledge and information systems, 2022-04, Vol.64 (4), p.995-1012</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</cites><orcidid>0000-0003-0491-307X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2646394189/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2646394189?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>Hao, Yang</creatorcontrib><creatorcontrib>Ding, Wenbiao</creatorcontrib><creatorcontrib>Liu, Zitao</creatorcontrib><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><title>Knowledge and information systems</title><addtitle>Knowl Inf Syst</addtitle><description>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .</description><subject>Algorithms</subject><subject>Computer Science</subject><subject>Crowdsourcing</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Database Management</subject><subject>Information Storage and Retrieval</subject><subject>Information Systems and Communication Service</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>IT in Business</subject><subject>Labels</subject><subject>Machine learning</subject><subject>Regular Paper</subject><subject>Representations</subject><subject>Sampling</subject><subject>Training</subject><issn>0219-1377</issn><issn>0219-3116</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kE9LxDAQxYMouK5-AU8Fz9VMkiZbb1L8B4te1GuIzWTt2k1r0rL47c3aBW-eZgbee_P4EXIO9BIoVVcRKECRUwY5BSlErg7ILF1lzgHk4X4HrtQxOYlxTSkoCTAjb084VqHb2uvM4xhMm0Wz6dvGr9I9bLvwmbkuZAH7gBH9YIam81mLJvidZtsMH1m988duDDXarDXv2MZTcuRMG_FsP-fk9e72pXrIl8_3j9XNMq-ZokNuAZHysrAclTWlqIXismC1RAWMgzQIvBZOOVQL6gQrHDIunDXWCI6G8jm5mHL70H2NGAe9Tj18eqmZFJKXAhZlUrFJlZrGGNDpPjQbE741UL3jpyd-OlHSv_y0SiY-mWIS-xWGv-h_XD9wBXRe</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Hao, Yang</creator><creator>Ding, Wenbiao</creator><creator>Liu, Zitao</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-0491-307X</orcidid></search><sort><creationdate>20220401</creationdate><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><author>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Computer Science</topic><topic>Crowdsourcing</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Database Management</topic><topic>Information Storage and Retrieval</topic><topic>Information Systems and Communication Service</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>IT in Business</topic><topic>Labels</topic><topic>Machine learning</topic><topic>Regular Paper</topic><topic>Representations</topic><topic>Sampling</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hao, Yang</creatorcontrib><creatorcontrib>Ding, Wenbiao</creatorcontrib><creatorcontrib>Liu, Zitao</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Complete</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer science database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Knowledge and information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hao, Yang</au><au>Ding, Wenbiao</au><au>Liu, Zitao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</atitle><jtitle>Knowledge and information systems</jtitle><stitle>Knowl Inf Syst</stitle><date>2022-04-01</date><risdate>2022</risdate><volume>64</volume><issue>4</issue><spage>995</spage><epage>1012</epage><pages>995-1012</pages><issn>0219-1377</issn><eissn>0219-3116</eissn><abstract>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s10115-021-01644-7</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0003-0491-307X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0219-1377
ispartof	Knowledge and information systems, 2022-04, Vol.64 (4), p.995-1012
issn	0219-1377 0219-3116
language	eng
recordid	cdi_proquest_journals_2646394189
source	ABI/INFORM Global; Springer Link
subjects	Algorithms Computer Science Crowdsourcing Data Mining and Knowledge Discovery Database Management Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Labels Machine learning Regular Paper Representations Sampling Training
title	NeuCrowd: neural sampling network for representation learning with crowdsourced labels
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T11%3A01%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NeuCrowd:%20neural%20sampling%20network%20for%20representation%20learning%20with%20crowdsourced%20labels&rft.jtitle=Knowledge%20and%20information%20systems&rft.au=Hao,%20Yang&rft.date=2022-04-01&rft.volume=64&rft.issue=4&rft.spage=995&rft.epage=1012&rft.pages=995-1012&rft.issn=0219-1377&rft.eissn=0219-3116&rft_id=info:doi/10.1007/s10115-021-01644-7&rft_dat=%3Cproquest_cross%3E2646394189%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2646394189&rft_id=info:pmid/&rfr_iscdi=true