Loading…

Leveraging Multiple Representations of Topic Models for Knowledge Discovery

Topic models are often useful in categorization of related documents in information retrieval and knowledge discovery systems, especially for large datasets. Interpreting the output of these models remains an ongoing challenge for the research community. The typical practice in the application of to...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2022, Vol.10, p.104696-104705
Main Authors: Potts, Colin M., Savaliya, Akshat, Jhala, Arnav
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Topic models are often useful in categorization of related documents in information retrieval and knowledge discovery systems, especially for large datasets. Interpreting the output of these models remains an ongoing challenge for the research community. The typical practice in the application of topic models is to tune the parameters of a chosen model for a target dataset and select the model with the best output based on a given metric. We present a novel perspective on topic analysis by presenting a process for combining output from multiple models with different theoretical underpinnings. We show that this results in our ability to tackle novel tasks such as semantic characterization of content that cannot be carried out by using single models. One example task is to characterize the differences between topics or documents in terms of their purpose and also importance with respect to the underlying output of the discovery algorithm. To show the potential benefit of leveraging multiple models we present an algorithm to map the term-space of Latent Dirichlet Allocation (LDA) to the neural document-embedding space of doc2vec. We also show that by utilizing both models in parallel and analyzing the resulting document distributions using the Normalized Pointwise Mutual Information (NPMI) metric we can gain insight into the purpose and importance of topics across models. This approach moves beyond topic identification to a richer characterization of the information and provides a better understanding of the complex relationships between these typically competing techniques.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3210529