Loading…
Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech
Latent topic modeling has proven to be an effective means for learning the underlying semantic content within document collections. Latent topic modeling has traditionally been applied to bag-of-words representations that ignore word sequence information that can aid in semantic understanding. In th...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Latent topic modeling has proven to be an effective means for learning the underlying semantic content within document collections. Latent topic modeling has traditionally been applied to bag-of-words representations that ignore word sequence information that can aid in semantic understanding. In this work we introduce a method for efficiently incorporating arbitrarily long word sequences into a topic modeling approach. This method iteratively constructs a constrained set of phrase trees in an unsupervised fashion from a document collection using weighted pointwise mutual information statistics to guide the process. In experiments on the Fisher Corpus of conversational speech, the incorporation of learned phrases into a latent topic model yielded significant improvements in the unsupervised discovery of the known topics present within the data. |
---|---|
DOI: | 10.1109/SLT.2012.6424226 |