Loading…
Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees
Efficient methods for storing and querying are critical for scaling high-order -gram language models to large corpora. We propose a language model based on , a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabil...
Saved in:
Published in: | Transactions of the Association for Computational Linguistics 2021-03, Vol.4, p.477-490 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Efficient methods for storing and querying are critical for scaling high-order
-gram language models to large corpora. We
propose a language model based on
, a representation that is highly compact and can be easily
held in memory, while supporting queries needed in computing language model
probabilities on-the-fly. We present several optimisations which improve query
runtimes up to 2500Ă—, despite only incurring a modest increase in construction
time and memory usage. For large corpora and high Markov orders, our method is
highly competitive with the state-of-the-art KenLM package. It imposes much
lower memory requirements, often by orders of magnitude, and has runtimes that
are either similar (for training) or comparable (for querying). |
---|---|
ISSN: | 2307-387X 2307-387X |
DOI: | 10.1162/tacl_a_00112 |