Loading…
Cost Effective Annotation Framework Using Zero-Shot Text Classification
Manual and high-quality annotation of social media data has enabled companies and researchers to develop improved implementations using natural language processing. However, human text-annotation is expensive and time-consuming. Crowd-sourcing platforms such as Amazon's Mechanical Turk (MTurk)...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Manual and high-quality annotation of social media data has enabled companies and researchers to develop improved implementations using natural language processing. However, human text-annotation is expensive and time-consuming. Crowd-sourcing platforms such as Amazon's Mechanical Turk (MTurk) can be leveraged for the creation of large training corpora for text classification tasks using social media data. Nevertheless, the quality of annotations can vary significantly, based on the interpretations and motivations of annotators completing the tasks. Further, the labelling cost of data through MTurk will increase if target messages are small and having a significant amount of noise (e.g. promotional messages on Twitter). In this work, we propose a new annotation framework to create high-quality human-annotated datasets for text classification from social media data. We present a zero-shot text classification based pre-annotation technique reducing the adverse effects arising due to the highly skewed distribution of data across target classes. The proposed framework significantly reduces the cost and time while maintaining the quality of the annotations. Being generic, it can be applied to annotating text data from any discipline. Our experiment with a Twitter data annotation using the proposed annotation framework shows a cost reduction of 80% with no compromise to quality. |
---|---|
ISSN: | 2161-4407 |
DOI: | 10.1109/IJCNN52387.2021.9534335 |