Loading…

Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing

A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of medical Internet research 2013-04, Vol.15 (4), p.e73-e73
Main Authors:	Zhai, Haijun, Lingren, Todd, Deleger, Louise, Li, Qi, Kaiser, Megan, Stoutenborough, Laura, Solti, Imre
Format:	Article
Language:	English
Subjects:	Agreements Annotations Attributes Automation Chi-square test Clinical research Clinical standards Clinical trials Clinical Trials as Topic - statistics & numerical data Computational linguistics Cost control Crowdsourcing Crowdsourcing - standards Crowdsourcing - statistics & numerical data Drug dosages Drugs Health information Humans Infrastructure Interfaces Internet Language processing Machine learning Mediation Medical records Medical research Mutation Natural language interfaces Natural Language Processing Original Paper Pilot Projects Quality Control Quality management Social Media Technology Telemedicine - standards Telemedicine - statistics & numerical data Usability User interface Voting Web 2.0 Workers
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. To build the gold standard for evaluating the crowdsourcing workers' performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd's work and tested the statistical significance (P
ISSN:	1438-8871 1439-4456 1438-8871
DOI:	10.2196/jmir.2426