Loading…
Iterative Reclassification in Agglomerative Clustering
In model-based clustering of complex data, a probability model, typically a finite mixture probability model, forms the basis of the distance measure between any pair of clusters. The idea of model-based clustering was popularized by the framework and accompanying software of Fraley and Raftery (200...
Saved in:
Published in: | Journal of computational and graphical statistics 2011-12, Vol.20 (4), p.920-936 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In model-based clustering of complex data, a probability model, typically a finite mixture probability model, forms the basis of the distance measure between any pair of clusters. The idea of model-based clustering was popularized by the framework and accompanying software of Fraley and Raftery (2002). In particular, model-based agglomerative hierarchical clustering is now a frequently used approach for probabilistic grouping of data, due to the speed and simplicity of implementation. This article investigates deficiencies in the clusterings proposed from this popular approach, and presents a review of small refinements and extensions to the procedure with differing performance gains and computational costs. The improvements are illustrated through application to simulated and real data examples, including the clustering of gene expression time profiles. Some of the proposed improvements to agglomerative clustering are, like the procedure itself in its usual form, deterministic; perhaps surprisingly though, the best overall results here are obtained via a stochasticized version of the entire procedure. While the focus of this article is probability model-based clustering, many of the schemes presented are equally applicable to agglomerative clustering under any distance measure.
The simulated data from this article along with the C++ code used for implementing the algorithms for all of the examples can be obtained online from the Supplemental Material. |
---|---|
ISSN: | 1061-8600 1537-2715 |
DOI: | 10.1198/jcgs.2011.09111 |