Loading…

One-Pass Additive-Error Subset Selection for ℓp Subspace Approximation and (k, p)-Clustering

We consider the problem of subset selection for ℓ p subspace approximation and ( k , p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original inpu...

Full description

Saved in:

Bibliographic Details
Published in:	Algorithmica 2023-10, Vol.85 (10), p.3144-3167
Main Authors:	Deshpande, Amit, Pratap, Rameshwar
Format:	Article
Language:	English
Subjects:	Adaptive sampling Algorithm Analysis and Problem Complexity Algorithms Approximation Clustering Computer Science Computer Systems Organization and Communication Networks Data points Data Structures and Information Theory Mathematical analysis Mathematics of Computing Optimization Subspaces Theory of Computation
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We consider the problem of subset selection for ℓ p subspace approximation and ( k , p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓ p subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p ∈ [ 1 , ∞ ) , require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓ p subspace approximation, for any p ∈ [ 1 , ∞ ) . Earlier subset selection algorithms that give a one-pass multiplicative ( 1 + ϵ ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative ( 1 + ϵ ) approximation guarantee for the special case of ℓ 2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with ( 1 + ϵ ) approximation guarantee for ℓ p subspace approximation when p ∈ { 1 , 2 } . Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p ∈ [ 1 , ∞ ) . We also focus on ( k , p )-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p ) is minimized for p ∈ [ 1 , ∞ ) . The subset selection algorithms are based on D p sampling due to Wei (NIPS’16, 2016) which is an extension of D 2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the D p sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for ( k , p )-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k -means for p = 2 ; however, our result gives a solution for a more generic problem when p ∈ [ 1 , ∞ ) . At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability.
ISSN:	0178-4617 1432-0541
DOI:	10.1007/s00453-023-01124-0