Loading…
Big data clustering via random sketching and validation
As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and cons...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to the dimensionality reduction problem. The advocated random sketching and validation K-means (SkeVa K-means) and Divergence SkeVa algorithms can achieve high performance, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives. |
---|---|
ISSN: | 2576-2303 |
DOI: | 10.1109/ACSSC.2014.7094614 |