Loading…

Improving clustering by learning a bi-stochastic data similarity matrix

An idealized clustering algorithm seeks to learn a cluster-adjacency matrix such that, if two data points belong to the same cluster, the corresponding entry would be 1; otherwise, the entry would be 0. This integer (1/0) constraint makes it difficult to find the optimal solution. We propose a relax...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge and information systems 2012-08, Vol.32 (2), p.351-382
Main Authors: Wang, Fei, Li, Ping, König, Arnd Christian, Wan, Muting
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An idealized clustering algorithm seeks to learn a cluster-adjacency matrix such that, if two data points belong to the same cluster, the corresponding entry would be 1; otherwise, the entry would be 0. This integer (1/0) constraint makes it difficult to find the optimal solution. We propose a relaxation on the cluster-adjacency matrix, by deriving a bi-stochastic matrix from a data similarity (e.g., kernel) matrix according to the Bregman divergence. Our general method is named the Bregmanian Bi-Stochastication (BBS) algorithm. We focus on two popular choices of the Bregman divergence: the Euclidean distance and the Kullback–Leibler (KL) divergence. Interestingly, the BBS algorithm using the KL divergence is equivalent to the Sinkhorn–Knopp (SK) algorithm for deriving bi-stochastic matrices. We show that the BBS algorithm using the Euclidean distance is closely related to the relaxed k -means clustering and can often produce noticeably superior clustering results to the SK algorithm (and other algorithms such as Normalized Cut), through extensive experiments on public data sets.
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-011-0433-1