Loading…

Improved fast partitional clustering algorithm for text clustering

Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and hi...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent & fuzzy systems 2020-01, Vol.39 (2), p.2137-2145
Main Authors: Bejos, Sebastián, Feliciano-Avelino, Ivan, Martínez-Trinidad, J. Fco, Carrasco-Ochoa, J. A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and high dimensional datasets, such as document collections. Recently the FPAC algorithm was proposed to mitigate this problem, but the improvement in the speed was reached at the cost of reducing the quality of the clustering results. For this reason, in this paper, we introduce an improved FPAC algorithm, which, according our experiments on different document collections, allows obtaining better clustering results than FPAC, without highly increasing the runtime.
ISSN:1064-1246
1875-8967
DOI:10.3233/JIFS-179879