Loading…

Information extraction from broadcast news

This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular, we concentrate on statistical finite-state models for identifying proper names and other named entities in broadcast speech. Two models are presented:...

Full description

Saved in:
Bibliographic Details
Published in:Philosophical transactions of the Royal Society of London. Series A: Mathematical, physical, and engineering sciences physical, and engineering sciences, 2000-04, Vol.358 (1769), p.1295-1310
Main Authors: Gotoh, Yoshihiko, Renals, Steve
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular, we concentrate on statistical finite-state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word-word and class-class transitions explicitly. A common n-gram-based formulation is used for both models. The task of named-entity identification is characterized by relatively sparse training data, and issues related to smoothing are discussed. Experiments are reported using Hub-4E the DARPA/NIST evaluation for North American broadcast news.
ISSN:1364-503X
1471-2962
DOI:10.1098/rsta.2000.0587