Loading…
Segmentation for Efficient Supervised Language Annotation with an Explicit Cost-Utility Tradeoff
In this paper, we study the problem of manually correcting automatic annotations of natural language in as efficient a manner as possible. We introduce a method for automatically segmenting a corpus into chunks such that many uncertain labels are grouped into the same chunk, while human supervision...
Saved in:
Published in: | Transactions of the Association for Computational Linguistics 2021-03, Vol.2, p.169-180 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we study the problem of manually correcting automatic annotations
of natural language in as efficient a manner as possible. We introduce a method
for automatically segmenting a corpus into chunks such that many uncertain
labels are grouped into the same chunk, while human supervision can be omitted
altogether for other segments. A tradeoff must be found for segment sizes.
Choosing short segments allows us to reduce the number of highly confident
labels that are supervised by the annotator, which is useful because these
labels are often already correct and supervising correct labels is a waste of
effort. In contrast, long segments reduce the cognitive effort due to context
switches. Our method helps find the segmentation that optimizes supervision
efficiency by defining user models to predict the cost and utility of
supervising each segment and solving a constrained optimization problem
balancing these contradictory objectives. A user study demonstrates noticeable
gains over pre-segmented, confidence-ordered baselines on two natural language
processing tasks: speech transcription and word segmentation. |
---|---|
ISSN: | 2307-387X 2307-387X |
DOI: | 10.1162/tacl_a_00174 |