Loading…

RIC: Parameter-free noise-robust clustering

How do we find a natural clustering of a real-world point set which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? As most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input para...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on knowledge discovery from data 2007-12, Vol.1 (3), p.10
Main Authors: Böhm, Christian, Faloutsos, Christos, Pan, Jia-Yu, Plant, Claudia
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c110t-e5fae34614293f671b0c7c2f409f587ca8aa610df1f29062ebce9de67a0978663
container_end_page
container_issue 3
container_start_page 10
container_title ACM transactions on knowledge discovery from data
container_volume 1
creator Böhm, Christian
Faloutsos, Christos
Pan, Jia-Yu
Plant, Claudia
description How do we find a natural clustering of a real-world point set which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? As most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input parameters, and are sensitive to noise. In this article, we propose a robust framework for determining a natural clustering of a given dataset, based on the minimum description length (MDL) principle. The proposed framework, robust information-theoretic clustering (RIC) , is orthogonal to any known clustering algorithm: Given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clusterings such that it simultaneously determines the most natural amount and shape (subspace) of the clusters. Our RIC method can be combined with any clustering technique ranging from K-means and K-medoids to advanced methods such as spectral clustering. In fact, RIC is even able to purify and improve an initial coarse clustering, even if we start with very simple methods. In an extension, we propose a fully automatic stand-alone clustering method and efficiency improvements. RIC scales well with the dataset size. Extensive experiments on synthetic and real-world datasets validate the proposed RIC framework.
doi_str_mv 10.1145/1297332.1297334
format article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_1297332_1297334</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_1297332_1297334</sourcerecordid><originalsourceid>FETCH-LOGICAL-c110t-e5fae34614293f671b0c7c2f409f587ca8aa610df1f29062ebce9de67a0978663</originalsourceid><addsrcrecordid>eNo1z8kKwjAUheHgANZh7VO0vTfDTbOU4lAQBFFwV2JMQFGUxo1vL9K6-s7qwM_YHCFDlCpHbrQQPGuVPZagUpRKzU-D_6YCR2wc4w1AKUSesP6-KqdsGOw9-lnnhB1Xy0O5Sbe7dVUutqlDhHfqVbBeSELJjQik8QxOOx4kmKAK7WxhLSFcAgZugLg_O28unrQFowsiMWF5--uaZ4yND_WruT5s86kR6l9D3TV0SvEFbNU2MA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RIC: Parameter-free noise-robust clustering</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Böhm, Christian ; Faloutsos, Christos ; Pan, Jia-Yu ; Plant, Claudia</creator><creatorcontrib>Böhm, Christian ; Faloutsos, Christos ; Pan, Jia-Yu ; Plant, Claudia</creatorcontrib><description>How do we find a natural clustering of a real-world point set which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? As most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input parameters, and are sensitive to noise. In this article, we propose a robust framework for determining a natural clustering of a given dataset, based on the minimum description length (MDL) principle. The proposed framework, robust information-theoretic clustering (RIC) , is orthogonal to any known clustering algorithm: Given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clusterings such that it simultaneously determines the most natural amount and shape (subspace) of the clusters. Our RIC method can be combined with any clustering technique ranging from K-means and K-medoids to advanced methods such as spectral clustering. In fact, RIC is even able to purify and improve an initial coarse clustering, even if we start with very simple methods. In an extension, we propose a fully automatic stand-alone clustering method and efficiency improvements. RIC scales well with the dataset size. Extensive experiments on synthetic and real-world datasets validate the proposed RIC framework.</description><identifier>ISSN: 1556-4681</identifier><identifier>EISSN: 1556-472X</identifier><identifier>DOI: 10.1145/1297332.1297334</identifier><language>eng</language><ispartof>ACM transactions on knowledge discovery from data, 2007-12, Vol.1 (3), p.10</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c110t-e5fae34614293f671b0c7c2f409f587ca8aa610df1f29062ebce9de67a0978663</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Böhm, Christian</creatorcontrib><creatorcontrib>Faloutsos, Christos</creatorcontrib><creatorcontrib>Pan, Jia-Yu</creatorcontrib><creatorcontrib>Plant, Claudia</creatorcontrib><title>RIC: Parameter-free noise-robust clustering</title><title>ACM transactions on knowledge discovery from data</title><description>How do we find a natural clustering of a real-world point set which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? As most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input parameters, and are sensitive to noise. In this article, we propose a robust framework for determining a natural clustering of a given dataset, based on the minimum description length (MDL) principle. The proposed framework, robust information-theoretic clustering (RIC) , is orthogonal to any known clustering algorithm: Given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clusterings such that it simultaneously determines the most natural amount and shape (subspace) of the clusters. Our RIC method can be combined with any clustering technique ranging from K-means and K-medoids to advanced methods such as spectral clustering. In fact, RIC is even able to purify and improve an initial coarse clustering, even if we start with very simple methods. In an extension, we propose a fully automatic stand-alone clustering method and efficiency improvements. RIC scales well with the dataset size. Extensive experiments on synthetic and real-world datasets validate the proposed RIC framework.</description><issn>1556-4681</issn><issn>1556-472X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><recordid>eNo1z8kKwjAUheHgANZh7VO0vTfDTbOU4lAQBFFwV2JMQFGUxo1vL9K6-s7qwM_YHCFDlCpHbrQQPGuVPZagUpRKzU-D_6YCR2wc4w1AKUSesP6-KqdsGOw9-lnnhB1Xy0O5Sbe7dVUutqlDhHfqVbBeSELJjQik8QxOOx4kmKAK7WxhLSFcAgZugLg_O28unrQFowsiMWF5--uaZ4yND_WruT5s86kR6l9D3TV0SvEFbNU2MA</recordid><startdate>200712</startdate><enddate>200712</enddate><creator>Böhm, Christian</creator><creator>Faloutsos, Christos</creator><creator>Pan, Jia-Yu</creator><creator>Plant, Claudia</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>200712</creationdate><title>RIC</title><author>Böhm, Christian ; Faloutsos, Christos ; Pan, Jia-Yu ; Plant, Claudia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c110t-e5fae34614293f671b0c7c2f409f587ca8aa610df1f29062ebce9de67a0978663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Böhm, Christian</creatorcontrib><creatorcontrib>Faloutsos, Christos</creatorcontrib><creatorcontrib>Pan, Jia-Yu</creatorcontrib><creatorcontrib>Plant, Claudia</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on knowledge discovery from data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Böhm, Christian</au><au>Faloutsos, Christos</au><au>Pan, Jia-Yu</au><au>Plant, Claudia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RIC: Parameter-free noise-robust clustering</atitle><jtitle>ACM transactions on knowledge discovery from data</jtitle><date>2007-12</date><risdate>2007</risdate><volume>1</volume><issue>3</issue><spage>10</spage><pages>10-</pages><issn>1556-4681</issn><eissn>1556-472X</eissn><abstract>How do we find a natural clustering of a real-world point set which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? As most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input parameters, and are sensitive to noise. In this article, we propose a robust framework for determining a natural clustering of a given dataset, based on the minimum description length (MDL) principle. The proposed framework, robust information-theoretic clustering (RIC) , is orthogonal to any known clustering algorithm: Given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clusterings such that it simultaneously determines the most natural amount and shape (subspace) of the clusters. Our RIC method can be combined with any clustering technique ranging from K-means and K-medoids to advanced methods such as spectral clustering. In fact, RIC is even able to purify and improve an initial coarse clustering, even if we start with very simple methods. In an extension, we propose a fully automatic stand-alone clustering method and efficiency improvements. RIC scales well with the dataset size. Extensive experiments on synthetic and real-world datasets validate the proposed RIC framework.</abstract><doi>10.1145/1297332.1297334</doi></addata></record>
fulltext fulltext
identifier ISSN: 1556-4681
ispartof ACM transactions on knowledge discovery from data, 2007-12, Vol.1 (3), p.10
issn 1556-4681
1556-472X
language eng
recordid cdi_crossref_primary_10_1145_1297332_1297334
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
title RIC: Parameter-free noise-robust clustering
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T21%3A47%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RIC:%20Parameter-free%20noise-robust%20clustering&rft.jtitle=ACM%20transactions%20on%20knowledge%20discovery%20from%20data&rft.au=B%C3%B6hm,%20Christian&rft.date=2007-12&rft.volume=1&rft.issue=3&rft.spage=10&rft.pages=10-&rft.issn=1556-4681&rft.eissn=1556-472X&rft_id=info:doi/10.1145/1297332.1297334&rft_dat=%3Ccrossref%3E10_1145_1297332_1297334%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c110t-e5fae34614293f671b0c7c2f409f587ca8aa610df1f29062ebce9de67a0978663%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true