Loading…
Information extraction from broadcast news
This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular, we concentrate on statistical finite-state models for identifying proper names and other named entities in broadcast speech. Two models are presented:...
Saved in:
Published in: | Philosophical transactions of the Royal Society of London. Series A: Mathematical, physical, and engineering sciences physical, and engineering sciences, 2000-04, Vol.358 (1769), p.1295-1310 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular, we concentrate on statistical finite-state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word-word and class-class transitions explicitly. A common n-gram-based formulation is used for both models. The task of named-entity identification is characterized by relatively sparse training data, and issues related to smoothing are discussed. Experiments are reported using Hub-4E the DARPA/NIST evaluation for North American broadcast news. |
---|---|
ISSN: | 1364-503X 1471-2962 |
DOI: | 10.1098/rsta.2000.0587 |