Loading…

NeuCrowd: neural sampling network for representation learning with crowdsourced labels

Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget lim...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge and information systems 2022-04, Vol.64 (4), p.995-1012
Main Authors: Hao, Yang, Ding, Wenbiao, Liu, Zitao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03
container_end_page 1012
container_issue 4
container_start_page 995
container_title Knowledge and information systems
container_volume 64
creator Hao, Yang
Ding, Wenbiao
Liu, Zitao
description Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .
doi_str_mv 10.1007/s10115-021-01644-7
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2646394189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2646394189</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-AU8Fz9VMkiZbb1L8B4te1GuIzWTt2k1r0rL47c3aBW-eZgbee_P4EXIO9BIoVVcRKECRUwY5BSlErg7ILF1lzgHk4X4HrtQxOYlxTSkoCTAjb084VqHb2uvM4xhMm0Wz6dvGr9I9bLvwmbkuZAH7gBH9YIam81mLJvidZtsMH1m988duDDXarDXv2MZTcuRMG_FsP-fk9e72pXrIl8_3j9XNMq-ZokNuAZHysrAclTWlqIXismC1RAWMgzQIvBZOOVQL6gQrHDIunDXWCI6G8jm5mHL70H2NGAe9Tj18eqmZFJKXAhZlUrFJlZrGGNDpPjQbE741UL3jpyd-OlHSv_y0SiY-mWIS-xWGv-h_XD9wBXRe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2646394189</pqid></control><display><type>article</type><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><source>ABI/INFORM Global</source><source>Springer Link</source><creator>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</creator><creatorcontrib>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</creatorcontrib><description>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .</description><identifier>ISSN: 0219-1377</identifier><identifier>EISSN: 0219-3116</identifier><identifier>DOI: 10.1007/s10115-021-01644-7</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Computer Science ; Crowdsourcing ; Data Mining and Knowledge Discovery ; Database Management ; Information Storage and Retrieval ; Information Systems and Communication Service ; Information Systems Applications (incl.Internet) ; IT in Business ; Labels ; Machine learning ; Regular Paper ; Representations ; Sampling ; Training</subject><ispartof>Knowledge and information systems, 2022-04, Vol.64 (4), p.995-1012</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</cites><orcidid>0000-0003-0491-307X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2646394189/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2646394189?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>Hao, Yang</creatorcontrib><creatorcontrib>Ding, Wenbiao</creatorcontrib><creatorcontrib>Liu, Zitao</creatorcontrib><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><title>Knowledge and information systems</title><addtitle>Knowl Inf Syst</addtitle><description>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .</description><subject>Algorithms</subject><subject>Computer Science</subject><subject>Crowdsourcing</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Database Management</subject><subject>Information Storage and Retrieval</subject><subject>Information Systems and Communication Service</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>IT in Business</subject><subject>Labels</subject><subject>Machine learning</subject><subject>Regular Paper</subject><subject>Representations</subject><subject>Sampling</subject><subject>Training</subject><issn>0219-1377</issn><issn>0219-3116</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kE9LxDAQxYMouK5-AU8Fz9VMkiZbb1L8B4te1GuIzWTt2k1r0rL47c3aBW-eZgbee_P4EXIO9BIoVVcRKECRUwY5BSlErg7ILF1lzgHk4X4HrtQxOYlxTSkoCTAjb084VqHb2uvM4xhMm0Wz6dvGr9I9bLvwmbkuZAH7gBH9YIam81mLJvidZtsMH1m988duDDXarDXv2MZTcuRMG_FsP-fk9e72pXrIl8_3j9XNMq-ZokNuAZHysrAclTWlqIXismC1RAWMgzQIvBZOOVQL6gQrHDIunDXWCI6G8jm5mHL70H2NGAe9Tj18eqmZFJKXAhZlUrFJlZrGGNDpPjQbE741UL3jpyd-OlHSv_y0SiY-mWIS-xWGv-h_XD9wBXRe</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Hao, Yang</creator><creator>Ding, Wenbiao</creator><creator>Liu, Zitao</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-0491-307X</orcidid></search><sort><creationdate>20220401</creationdate><title>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</title><author>Hao, Yang ; Ding, Wenbiao ; Liu, Zitao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Computer Science</topic><topic>Crowdsourcing</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Database Management</topic><topic>Information Storage and Retrieval</topic><topic>Information Systems and Communication Service</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>IT in Business</topic><topic>Labels</topic><topic>Machine learning</topic><topic>Regular Paper</topic><topic>Representations</topic><topic>Sampling</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hao, Yang</creatorcontrib><creatorcontrib>Ding, Wenbiao</creatorcontrib><creatorcontrib>Liu, Zitao</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Complete</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer science database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Knowledge and information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hao, Yang</au><au>Ding, Wenbiao</au><au>Liu, Zitao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NeuCrowd: neural sampling network for representation learning with crowdsourced labels</atitle><jtitle>Knowledge and information systems</jtitle><stitle>Knowl Inf Syst</stitle><date>2022-04-01</date><risdate>2022</risdate><volume>64</volume><issue>4</issue><spage>995</spage><epage>1012</epage><pages>995-1012</pages><issn>0219-1377</issn><eissn>0219-3116</eissn><abstract>Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, and education. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators’ diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose NeuCrowd , a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality n -tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at https://github.com/tal-ai/NeuCrowd_KAIS2021 .</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s10115-021-01644-7</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0003-0491-307X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0219-1377
ispartof Knowledge and information systems, 2022-04, Vol.64 (4), p.995-1012
issn 0219-1377
0219-3116
language eng
recordid cdi_proquest_journals_2646394189
source ABI/INFORM Global; Springer Link
subjects Algorithms
Computer Science
Crowdsourcing
Data Mining and Knowledge Discovery
Database Management
Information Storage and Retrieval
Information Systems and Communication Service
Information Systems Applications (incl.Internet)
IT in Business
Labels
Machine learning
Regular Paper
Representations
Sampling
Training
title NeuCrowd: neural sampling network for representation learning with crowdsourced labels
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T11%3A01%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NeuCrowd:%20neural%20sampling%20network%20for%20representation%20learning%20with%20crowdsourced%20labels&rft.jtitle=Knowledge%20and%20information%20systems&rft.au=Hao,%20Yang&rft.date=2022-04-01&rft.volume=64&rft.issue=4&rft.spage=995&rft.epage=1012&rft.pages=995-1012&rft.issn=0219-1377&rft.eissn=0219-3116&rft_id=info:doi/10.1007/s10115-021-01644-7&rft_dat=%3Cproquest_cross%3E2646394189%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c270t-d1ee0395d3e7da94c473652c6e712316ae13c4f7fe780f425fe234fdada43ea03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2646394189&rft_id=info:pmid/&rfr_iscdi=true