Loading…

Sampling-based dimension reduction for subspace approximation with outliers

•Consider the problem of finding low-rank approximations for norm in presence of ℓp outliers in the dataset, where p≥2.•Technique is sampling based and suggests additive and multiplicative approximation guarantees for the problem.•We extend our results to M-estimator loss functions and affine subspa...

Full description

Saved in:
Bibliographic Details
Published in:Theoretical computer science 2021-02, Vol.858, p.100-113
Main Authors: Deshpande, Amit, Pratap, Rameshwar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Consider the problem of finding low-rank approximations for norm in presence of ℓp outliers in the dataset, where p≥2.•Technique is sampling based and suggests additive and multiplicative approximation guarantees for the problem.•We extend our results to M-estimator loss functions and affine subspace approximation in presence of outliers. The subspace approximation problem with outliers, for given n points in d dimensions x1,x2,…,xn∈Rd, an integer 1≤k≤d, and an outlier parameter 0≤α≤1, is to find a k-dimensional linear subspace of Rd that minimizes the sum of squared distances to its nearest (1−α)n points. More generally, the ℓp subspace approximation problem with outliers minimizes the sum of p-th powers of distances instead of the sum of squared distances. Even the case of p=2 or robust PCA is non-trivial, and previous work requires additional assumptions on the input or generative models for it. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the (1−α)n inliers in the optimal solution are promised to lie exactly on a k-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard, and known algorithmic results for robust subspace recovery require strong assumptions on the input, e.g., any d outliers must be linearly independent. In this paper, we show how to extend dimension reduction techniques and bi-criteria approximations based on sampling and coresets to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal k-dimensional subspace summed over the optimal (1−α)n inliers is at least δ times its squared-error summed over all n points, for some 0
ISSN:0304-3975
1879-2294
DOI:10.1016/j.tcs.2021.01.021