Loading…
Clustering for metric and nonmetric distance measures
We study a generalization of the k -median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n , our goal is to find a set C of size k such that the sum of errors D( P,C ) = ∑ p ∈ P min c ∈ C {D( p,c )} is minimized. The main result in this article can be sta...
Saved in:
Published in: | ACM transactions on algorithms 2010-08, Vol.6 (4), p.1-26 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93 |
---|---|
cites | cdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93 |
container_end_page | 26 |
container_issue | 4 |
container_start_page | 1 |
container_title | ACM transactions on algorithms |
container_volume | 6 |
creator | Ackermann, Marcel R. Blömer, Johannes Sohler, Christian |
description | We study a generalization of the
k
-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set
P
of size
n
, our goal is to find a set
C
of size
k
such that the sum of errors D(
P,C
) = ∑
p
∈
P
min
c
∈
C
{D(
p,c
)} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the
k
-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time
n
2
O
(
mk
log(
mk
/ϵ)), where
m
is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the
k
-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean
k
-median problem and the Euclidean
k
-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004]. |
doi_str_mv | 10.1145/1824777.1824779 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_963852427</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>963852427</sourcerecordid><originalsourceid>FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</originalsourceid><addsrcrecordid>eNo9kM1LxDAUxIMouFbPXnvz1N0kLx_NUYq6woIXPYe3aSKVNl2T9uB_b2WLp988ZpgHQ8g9o1vGhNyxmgut9fZMc0E2TApTKQC4_NdcXpObnL8oBQNQb4hs-jlPPnXxswxjKgc_pc6VGNsyjnG92i5PGJ1fXMxz8vmWXAXss79bWZCP56f3Zl8d3l5em8dD5biGqULmUEHtmaBI0WkhUStECfxoNA-6Fc5o5ajSIgiAozIaGPdUem1ccAYK8nDuPaXxe_Z5skOXne97jH6cszVLu-RieVaQ3Tnp0phz8sGeUjdg-rGM2r997LrPSgO_BKNXGQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>963852427</pqid></control><display><type>article</type><title>Clustering for metric and nonmetric distance measures</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Ackermann, Marcel R. ; Blömer, Johannes ; Sohler, Christian</creator><creatorcontrib>Ackermann, Marcel R. ; Blömer, Johannes ; Sohler, Christian</creatorcontrib><description>We study a generalization of the
k
-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set
P
of size
n
, our goal is to find a set
C
of size
k
such that the sum of errors D(
P,C
) = ∑
p
∈
P
min
c
∈
C
{D(
p,c
)} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the
k
-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time
n
2
O
(
mk
log(
mk
/ϵ)), where
m
is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the
k
-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean
k
-median problem and the Euclidean
k
-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].</description><identifier>ISSN: 1549-6325</identifier><identifier>EISSN: 1549-6333</identifier><identifier>DOI: 10.1145/1824777.1824779</identifier><language>eng</language><subject>Algorithms ; Approximation ; Clustering ; Divergence ; Entropy ; Error analysis ; Mathematical analysis ; Metric space</subject><ispartof>ACM transactions on algorithms, 2010-08, Vol.6 (4), p.1-26</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</citedby><cites>FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Ackermann, Marcel R.</creatorcontrib><creatorcontrib>Blömer, Johannes</creatorcontrib><creatorcontrib>Sohler, Christian</creatorcontrib><title>Clustering for metric and nonmetric distance measures</title><title>ACM transactions on algorithms</title><description>We study a generalization of the
k
-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set
P
of size
n
, our goal is to find a set
C
of size
k
such that the sum of errors D(
P,C
) = ∑
p
∈
P
min
c
∈
C
{D(
p,c
)} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the
k
-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time
n
2
O
(
mk
log(
mk
/ϵ)), where
m
is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the
k
-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean
k
-median problem and the Euclidean
k
-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].</description><subject>Algorithms</subject><subject>Approximation</subject><subject>Clustering</subject><subject>Divergence</subject><subject>Entropy</subject><subject>Error analysis</subject><subject>Mathematical analysis</subject><subject>Metric space</subject><issn>1549-6325</issn><issn>1549-6333</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNo9kM1LxDAUxIMouFbPXnvz1N0kLx_NUYq6woIXPYe3aSKVNl2T9uB_b2WLp988ZpgHQ8g9o1vGhNyxmgut9fZMc0E2TApTKQC4_NdcXpObnL8oBQNQb4hs-jlPPnXxswxjKgc_pc6VGNsyjnG92i5PGJ1fXMxz8vmWXAXss79bWZCP56f3Zl8d3l5em8dD5biGqULmUEHtmaBI0WkhUStECfxoNA-6Fc5o5ajSIgiAozIaGPdUem1ccAYK8nDuPaXxe_Z5skOXne97jH6cszVLu-RieVaQ3Tnp0phz8sGeUjdg-rGM2r997LrPSgO_BKNXGQ</recordid><startdate>201008</startdate><enddate>201008</enddate><creator>Ackermann, Marcel R.</creator><creator>Blömer, Johannes</creator><creator>Sohler, Christian</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201008</creationdate><title>Clustering for metric and nonmetric distance measures</title><author>Ackermann, Marcel R. ; Blömer, Johannes ; Sohler, Christian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Approximation</topic><topic>Clustering</topic><topic>Divergence</topic><topic>Entropy</topic><topic>Error analysis</topic><topic>Mathematical analysis</topic><topic>Metric space</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ackermann, Marcel R.</creatorcontrib><creatorcontrib>Blömer, Johannes</creatorcontrib><creatorcontrib>Sohler, Christian</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on algorithms</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ackermann, Marcel R.</au><au>Blömer, Johannes</au><au>Sohler, Christian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering for metric and nonmetric distance measures</atitle><jtitle>ACM transactions on algorithms</jtitle><date>2010-08</date><risdate>2010</risdate><volume>6</volume><issue>4</issue><spage>1</spage><epage>26</epage><pages>1-26</pages><issn>1549-6325</issn><eissn>1549-6333</eissn><abstract>We study a generalization of the
k
-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set
P
of size
n
, our goal is to find a set
C
of size
k
such that the sum of errors D(
P,C
) = ∑
p
∈
P
min
c
∈
C
{D(
p,c
)} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the
k
-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time
n
2
O
(
mk
log(
mk
/ϵ)), where
m
is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the
k
-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean
k
-median problem and the Euclidean
k
-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].</abstract><doi>10.1145/1824777.1824779</doi><tpages>26</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1549-6325 |
ispartof | ACM transactions on algorithms, 2010-08, Vol.6 (4), p.1-26 |
issn | 1549-6325 1549-6333 |
language | eng |
recordid | cdi_proquest_miscellaneous_963852427 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list) |
subjects | Algorithms Approximation Clustering Divergence Entropy Error analysis Mathematical analysis Metric space |
title | Clustering for metric and nonmetric distance measures |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T06%3A59%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20for%20metric%20and%20nonmetric%20distance%20measures&rft.jtitle=ACM%20transactions%20on%20algorithms&rft.au=Ackermann,%20Marcel%20R.&rft.date=2010-08&rft.volume=6&rft.issue=4&rft.spage=1&rft.epage=26&rft.pages=1-26&rft.issn=1549-6325&rft.eissn=1549-6333&rft_id=info:doi/10.1145/1824777.1824779&rft_dat=%3Cproquest_cross%3E963852427%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c273t-a1ca638e140a0ac745a76aa532b972f7d4c976c0674f433b697312e05e79cfc93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=963852427&rft_id=info:pmid/&rfr_iscdi=true |