Loading…
On Structuring Probabilistic Dependencies in Stochastic Language Modelling
Three methods for structuring the probabilistic dependencies in stochastic language models were tested & compared: (1) nonlinear interpolation as an alternative to linear interpolation, (2) statistical clustering for finding word categories, & (3) cache memory & word associations. Two te...
Saved in:
Published in: | Computer speech & language 1994-01, Vol.8 (1), p.1-38 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Three methods for structuring the probabilistic dependencies in stochastic language models were tested & compared: (1) nonlinear interpolation as an alternative to linear interpolation, (2) statistical clustering for finding word categories, & (3) cache memory & word associations. Two test corpora - one German database of 14,000 words & one English database of 50,000 words - were used to test these models. For estimating the nonlinear & linear interpolation parameters, the leaving-one-out method was effectively used. Findings show significant improvement in nonlinear interpolation method over linear interpolation. The second method, a statistical clustering procedure for determining word equivalence classes, was also successfully tested. The word association model, with thus far only preliminary results, was used to cover long-distance dependencies. Related to the word association model, the cache model was regarded as a self-association, & its testing resulted in improvements comparable to those of the nonlinear interpolation model with somewhat higher perplexities. All experimental results are presented in detail. 8 Tables, 4 Figures, 20 References. C. Brennan |
---|---|
ISSN: | 0885-2308 |