Loading…
A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections
This paper presents a probabilistic mixture modeling framework for the hierarchic organization of document collections. It is demonstrated that the probabilistic corpus model which emerges from the automatic or unsupervised hierarchical organization of a document collection can be further exploited...
Saved in:
Published in: | Journal of intelligent information systems 2002-03, Vol.18 (2-3), p.153 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper presents a probabilistic mixture modeling framework for the hierarchic organization of document collections. It is demonstrated that the probabilistic corpus model which emerges from the automatic or unsupervised hierarchical organization of a document collection can be further exploited to create a kernel which boosts the performance of state-of-the-art support vector machine document classifiers. It is shown that the performance of such a classifier is further enhanced when employing the kernel derived from an appropriate hierarchic mixture model used for partitioning a document corpus rather than the kernel associated with a flat non-hierarchic mixture model. This has important implications for document classification when a hierarchic ordering of topics exists. This can be considered as the effective combination of documents with no topic or class labels (unlabeled data), labeled documents, and prior domain knowledge (in the form of the known hierarchic structure), in providing enhanced document classification performance. [PUBLICATION ABSTRACT] |
---|---|
ISSN: | 0925-9902 1573-7675 |
DOI: | 10.1023/A:1013677411002 |