Loading…

Class-dependent projection based method for text categorization

► We introduce a class-dependent projection method to text categorization. In the new method, the document categories are projected into their special reduced subspaces to make different classes easily separable. The subspaces corresponding to different classes are generated using a soft feature wei...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters 2011-07, Vol.32 (10), p.1493-1501
Main Authors: Chen, Lifei, Guo, Gongde, Wang, Kaijun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:► We introduce a class-dependent projection method to text categorization. In the new method, the document categories are projected into their special reduced subspaces to make different classes easily separable. The subspaces corresponding to different classes are generated using a soft feature weighting scheme, and are different from each other. ► We extend the traditional centroid-based classifier (CBC) to present a simple but effective classifier for text categorization. The new classifier inherits the merits of simplicity and efficiency of CBC, but significantly improves the classification accuracy, by making use of our class-dependent projection method both in the training and testing phases of the new classifier. ► We gain insights from the experiments that the new classifier is robust with respect to the number of terms used to represent the document, and is able to outperform the SVM based classifiers when the document categories are overlapped considerably. Text categorization presents unique challenges to traditional classification methods due to the large number of features inherent in the datasets from real-world applications of text categorization, and a great deal of training samples. In high-dimensional document data, the classes are typically categorized only by subsets of features, which are typically different for the classes of different topics. This paper presents a simple but effective classifier for text categorization using class-dependent projection based method. By projecting onto a set of individual subspaces, the samples belonging to different document classes are separated such that they are easily to be classified. This is achieved by developing a new supervised feature weighting algorithm to learn the optimized subspaces for all the document classes. The experiments carried out on common benchmarking corpuses showed that the proposed method achieved both higher classification accuracy and lower computational costs than some distinguishing classifiers in text categorization, especially for datasets including document categories with overlapping topics.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2011.01.018