Loading…

Topic-weak-correlated Latent Dirichlet allocation

Latent Dirichlet allocation (LDA) has been widely used for analyzing large text corpora. In this paper we propose the topic-weak-correlated LDA (TWC-LDA) for topic modeling, which constrains different topics to be weak-correlated. This is technically achieved by placing a special prior over the topi...

Full description

Saved in:
Bibliographic Details
Main Authors: Yimin Tan, Zhijian Ou
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Latent Dirichlet allocation (LDA) has been widely used for analyzing large text corpora. In this paper we propose the topic-weak-correlated LDA (TWC-LDA) for topic modeling, which constrains different topics to be weak-correlated. This is technically achieved by placing a special prior over the topic-word distributions. Reducing the overlapping between the topic-word distributions makes the learned topics more interpretable in the sense that each topic word-distribution can be clearly associated to a distinctive semantic meaning. Experimental results on both synthetic and real-world corpus show the superiority of the TWC-LDA over the basic LDA for semantically meaningful topic discovery and document classification.
DOI:10.1109/ISCSLP.2010.5684906