Loading…

Graph Contrastive Topic Model

Contrastive learning has recently been introduced into neural topic models (NTMs) to improve latent semantic discovery, but existing methods suffer from the sample bias problem owing to word frequency-based sampling strategy, which may result in false negative samples with similar semantics to the p...

Full description

Saved in:

Bibliographic Details
Published in:	Expert systems with applications 2024-12, Vol.255, p.124631, Article 124631
Main Authors:	Luo, Zheheng, Liu, Lei, Ananiadou, Sophia, Xie, Qianqian
Format:	Article
Language:	English
Subjects:	Contrastive learning Document representations Graph neural networks Neural topic modelling
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Contrastive learning has recently been introduced into neural topic models (NTMs) to improve latent semantic discovery, but existing methods suffer from the sample bias problem owing to word frequency-based sampling strategy, which may result in false negative samples with similar semantics to the prototypes. We propose the novel graph contrastive neural topic model (GCTM), based on the graph-based sampling strategy, guided by the in-depth correlation and irrelevance information among documents and words. We model the input document as the document word bipartite graph (DWBG) and construct positive and negative word co-occurrence graphs (WCGs), to capture in-depth semantic correlation and irrelevance among words. Based on the DWBG and WCGs, we design the document-word information propagation (DWIP) process to perform the edge perturbation of DWBG, based on multi-hop correlations/irrelevance among documents and words. This yields the desired negative and positive samples, which are utilized for GCL together with the prototypes to improve learning document topic representations and latent topics. Experiments on several benchmark datasets demonstrate the effectiveness of our method for topic coherence and document representation learning compared with existing state-of-the-art methods. •The paper proposes a graph contrastive neural topic model (GCTM) to improve semantic discovery.•GCTM uses a graph-based sampling strategy to avoid sample bias from word frequency-based methods.•The model uses document word bipartite graph (DWBG) and word co-occurrence graphs (WCGs).•The document-word information propagation (DWIP) process is used for edge perturbation of DWBG.•Experiments show GCTM improves topic coherence and document representation learning.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2024.124631