Loading…

On Structuring Probabilistic Dependencies in Stochastic Language Modelling

Three methods for structuring the probabilistic dependencies in stochastic language models were tested & compared: (1) nonlinear interpolation as an alternative to linear interpolation, (2) statistical clustering for finding word categories, & (3) cache memory & word associations. Two te...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language 1994-01, Vol.8 (1), p.1-38
Main Authors: Ney, Hermann, Essen, Ute, Kneser, Reinhard
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Three methods for structuring the probabilistic dependencies in stochastic language models were tested & compared: (1) nonlinear interpolation as an alternative to linear interpolation, (2) statistical clustering for finding word categories, & (3) cache memory & word associations. Two test corpora - one German database of 14,000 words & one English database of 50,000 words - were used to test these models. For estimating the nonlinear & linear interpolation parameters, the leaving-one-out method was effectively used. Findings show significant improvement in nonlinear interpolation method over linear interpolation. The second method, a statistical clustering procedure for determining word equivalence classes, was also successfully tested. The word association model, with thus far only preliminary results, was used to cover long-distance dependencies. Related to the word association model, the cache model was regarded as a self-association, & its testing resulted in improvements comparable to those of the nonlinear interpolation model with somewhat higher perplexities. All experimental results are presented in detail. 8 Tables, 4 Figures, 20 References. C. Brennan
ISSN:0885-2308