Loading…

Classification of Distributed Data Using Topic Modeling and Maximum Variation Sampling

From a management perspective, understanding the information that exists on a network and how it is distributed provides a critical advantage. This work explores the use of topic modeling as an approach to automatically determine the classes of information that exist on an organization's networ...

Full description

Saved in:
Bibliographic Details
Main Authors: Patton, R M, Beaver, J M, Potok, T E
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:From a management perspective, understanding the information that exists on a network and how it is distributed provides a critical advantage. This work explores the use of topic modeling as an approach to automatically determine the classes of information that exist on an organization's network, and then use the resultant topics as centroid vectors for the classification of individual documents in order to understand the distribution of information topics across the enterprise network. The approach is tested using the 20 Newsgroups dataset.
ISSN:1530-1605
2572-6862
DOI:10.1109/HICSS.2011.101