Loading…
One-Pass Additive-Error Subset Selection for ℓp Subspace Approximation and (k, p)-Clustering
We consider the problem of subset selection for ℓ p subspace approximation and ( k , p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original inpu...
Saved in:
Published in: | Algorithmica 2023-10, Vol.85 (10), p.3144-3167 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We consider the problem of subset selection for
ℓ
p
subspace approximation and (
k
,
p
)-clustering. Our aim is to efficiently find a
small
subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For
ℓ
p
subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of
p
∈
[
1
,
∞
)
, require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for
ℓ
p
subspace approximation, for any
p
∈
[
1
,
∞
)
. Earlier subset selection algorithms that give a one-pass multiplicative
(
1
+
ϵ
)
approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative
(
1
+
ϵ
)
approximation guarantee for the special case of
ℓ
2
subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass
noisy
subset selection with
(
1
+
ϵ
)
approximation guarantee for
ℓ
p
subspace approximation when
p
∈
{
1
,
2
}
. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any
p
∈
[
1
,
∞
)
. We also focus on (
k
,
p
)-clustering, where the task is to group the data points into
k
clusters such that the sum of distances from points to cluster centers (raised to the power
p
) is minimized for
p
∈
[
1
,
∞
)
. The subset selection algorithms are based on
D
p
sampling due to Wei (NIPS’16, 2016) which is an extension of
D
2
sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the
D
p
sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for (
k
,
p
)-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for
k
-means for
p
=
2
; however, our result gives a solution for a more generic problem when
p
∈
[
1
,
∞
)
. At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability. |
---|---|
ISSN: | 0178-4617 1432-0541 |
DOI: | 10.1007/s00453-023-01124-0 |