Loading…

Clustering for metric and nonmetric distance measures

We study a generalization of the k -median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n , our goal is to find a set C of size k such that the sum of errors D( P,C ) = ∑ p ∈ P min c ∈ C {D( p,c )} is minimized. The main result in this article can be sta...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on algorithms 2010-08, Vol.6 (4), p.1-26
Main Authors: Ackermann, Marcel R., Blömer, Johannes, Sohler, Christian
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93
cites cdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93
container_end_page 26
container_issue 4
container_start_page 1
container_title ACM transactions on algorithms
container_volume 6
creator Ackermann, Marcel R.
Blömer, Johannes
Sohler, Christian
description We study a generalization of the k -median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n , our goal is to find a set C of size k such that the sum of errors D( P,C ) = ∑ p ∈ P min c ∈ C {D( p,c )} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the k -median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n 2 O ( mk log( mk /ϵ)), where m is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the k -median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k -median problem and the Euclidean k -means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].
doi_str_mv 10.1145/1824777.1824779
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_963852427</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>963852427</sourcerecordid><originalsourceid>FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</originalsourceid><addsrcrecordid>eNo9kM1LxDAUxIMouFbPXnvz1N0kLx_NUYq6woIXPYe3aSKVNl2T9uB_b2WLp988ZpgHQ8g9o1vGhNyxmgut9fZMc0E2TApTKQC4_NdcXpObnL8oBQNQb4hs-jlPPnXxswxjKgc_pc6VGNsyjnG92i5PGJ1fXMxz8vmWXAXss79bWZCP56f3Zl8d3l5em8dD5biGqULmUEHtmaBI0WkhUStECfxoNA-6Fc5o5ajSIgiAozIaGPdUem1ccAYK8nDuPaXxe_Z5skOXne97jH6cszVLu-RieVaQ3Tnp0phz8sGeUjdg-rGM2r997LrPSgO_BKNXGQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>963852427</pqid></control><display><type>article</type><title>Clustering for metric and nonmetric distance measures</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Ackermann, Marcel R. ; Blömer, Johannes ; Sohler, Christian</creator><creatorcontrib>Ackermann, Marcel R. ; Blömer, Johannes ; Sohler, Christian</creatorcontrib><description>We study a generalization of the k -median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n , our goal is to find a set C of size k such that the sum of errors D( P,C ) = ∑ p ∈ P min c ∈ C {D( p,c )} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the k -median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n 2 O ( mk log( mk /ϵ)), where m is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the k -median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k -median problem and the Euclidean k -means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].</description><identifier>ISSN: 1549-6325</identifier><identifier>EISSN: 1549-6333</identifier><identifier>DOI: 10.1145/1824777.1824779</identifier><language>eng</language><subject>Algorithms ; Approximation ; Clustering ; Divergence ; Entropy ; Error analysis ; Mathematical analysis ; Metric space</subject><ispartof>ACM transactions on algorithms, 2010-08, Vol.6 (4), p.1-26</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</citedby><cites>FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Ackermann, Marcel R.</creatorcontrib><creatorcontrib>Blömer, Johannes</creatorcontrib><creatorcontrib>Sohler, Christian</creatorcontrib><title>Clustering for metric and nonmetric distance measures</title><title>ACM transactions on algorithms</title><description>We study a generalization of the k -median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n , our goal is to find a set C of size k such that the sum of errors D( P,C ) = ∑ p ∈ P min c ∈ C {D( p,c )} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the k -median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n 2 O ( mk log( mk /ϵ)), where m is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the k -median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k -median problem and the Euclidean k -means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].</description><subject>Algorithms</subject><subject>Approximation</subject><subject>Clustering</subject><subject>Divergence</subject><subject>Entropy</subject><subject>Error analysis</subject><subject>Mathematical analysis</subject><subject>Metric space</subject><issn>1549-6325</issn><issn>1549-6333</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNo9kM1LxDAUxIMouFbPXnvz1N0kLx_NUYq6woIXPYe3aSKVNl2T9uB_b2WLp988ZpgHQ8g9o1vGhNyxmgut9fZMc0E2TApTKQC4_NdcXpObnL8oBQNQb4hs-jlPPnXxswxjKgc_pc6VGNsyjnG92i5PGJ1fXMxz8vmWXAXss79bWZCP56f3Zl8d3l5em8dD5biGqULmUEHtmaBI0WkhUStECfxoNA-6Fc5o5ajSIgiAozIaGPdUem1ccAYK8nDuPaXxe_Z5skOXne97jH6cszVLu-RieVaQ3Tnp0phz8sGeUjdg-rGM2r997LrPSgO_BKNXGQ</recordid><startdate>201008</startdate><enddate>201008</enddate><creator>Ackermann, Marcel R.</creator><creator>Blömer, Johannes</creator><creator>Sohler, Christian</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201008</creationdate><title>Clustering for metric and nonmetric distance measures</title><author>Ackermann, Marcel R. ; Blömer, Johannes ; Sohler, Christian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Approximation</topic><topic>Clustering</topic><topic>Divergence</topic><topic>Entropy</topic><topic>Error analysis</topic><topic>Mathematical analysis</topic><topic>Metric space</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ackermann, Marcel R.</creatorcontrib><creatorcontrib>Blömer, Johannes</creatorcontrib><creatorcontrib>Sohler, Christian</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on algorithms</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ackermann, Marcel R.</au><au>Blömer, Johannes</au><au>Sohler, Christian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering for metric and nonmetric distance measures</atitle><jtitle>ACM transactions on algorithms</jtitle><date>2010-08</date><risdate>2010</risdate><volume>6</volume><issue>4</issue><spage>1</spage><epage>26</epage><pages>1-26</pages><issn>1549-6325</issn><eissn>1549-6333</eissn><abstract>We study a generalization of the k -median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n , our goal is to find a set C of size k such that the sum of errors D( P,C ) = ∑ p ∈ P min c ∈ C {D( p,c )} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the k -median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n 2 O ( mk log( mk /ϵ)), where m is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the k -median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k -median problem and the Euclidean k -means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].</abstract><doi>10.1145/1824777.1824779</doi><tpages>26</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1549-6325
ispartof ACM transactions on algorithms, 2010-08, Vol.6 (4), p.1-26
issn 1549-6325
1549-6333
language eng
recordid cdi_proquest_miscellaneous_963852427
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Algorithms
Approximation
Clustering
Divergence
Entropy
Error analysis
Mathematical analysis
Metric space
title Clustering for metric and nonmetric distance measures
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T06%3A59%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20for%20metric%20and%20nonmetric%20distance%20measures&rft.jtitle=ACM%20transactions%20on%20algorithms&rft.au=Ackermann,%20Marcel%20R.&rft.date=2010-08&rft.volume=6&rft.issue=4&rft.spage=1&rft.epage=26&rft.pages=1-26&rft.issn=1549-6325&rft.eissn=1549-6333&rft_id=info:doi/10.1145/1824777.1824779&rft_dat=%3Cproquest_cross%3E963852427%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=963852427&rft_id=info:pmid/&rfr_iscdi=true