Loading…

Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization

Text categorization is a significant technique to manage the surging text data on the Internet. The k-nearest neighbors (kNN) algorithm is an effective, but not efficient, classification model for text categorization. In this paper, we propose an effec- tive strategy to accelerate the standard kNN,...

Full description

Saved in:
Bibliographic Details
Published in:Frontiers of information technology & electronic engineering 2013-06, Vol.14 (6), p.407-416
Main Authors: Du, Min, Chen, Xing-shu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text categorization is a significant technique to manage the surging text data on the Internet. The k-nearest neighbors (kNN) algorithm is an effective, but not efficient, classification model for text categorization. In this paper, we propose an effec- tive strategy to accelerate the standard kNN, based on a simple principle: usually, near points in space are also near when they are projected into a direction, which means that distant points in the projection direction are also distant in the original space. Using the proposed strategy, most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point, which greatly decreases the computation cost. Experimental results show that the proposed strategy greatly improves the time per- formance of the standard kNN, with little degradation in accuracy. Specifically, it is superior in applications that have large and high-dimensional datasets.
ISSN:1869-1951
2095-9184
1869-196X
2095-9230
DOI:10.1631/jzus.C1200303