Loading…

Robust covariance estimation for distributed principal component analysis

Fan et al. (Ann Stat 47(6):3009–3031, 2019) constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm’s guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhan...

Full description

Saved in:
Bibliographic Details
Published in:Metrika 2022-08, Vol.85 (6), p.707-732
Main Authors: Li, Kangqiang, Bao, Han, Zhang, Lixin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fan et al. (Ann Stat 47(6):3009–3031, 2019) constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm’s guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker (Ann Stat 46(6A):2871–2903, 2018) and Ke et al. (Stat Sci 34(3):454–471, 2019) to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite 6th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.
ISSN:0026-1335
1435-926X
DOI:10.1007/s00184-021-00848-9