Loading…

On Consistency and Sparsity for Principal Components Analysis in High Dimensions

Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) tha...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of the American Statistical Association 2009-06, Vol.104 (486), p.682-693
Main Authors:	Johnstone, Iain M., Lu, Arthur Yu
Format:	Article
Language:	English
Subjects:	Applications Coordinate systems Covariance Covariance matrices Eigenvalues Eigenvector estimation Eigenvectors Exact sciences and technology General topics Mathematical vectors Mathematics Multivariate analysis Parametric inference Principal components analysis Probability and statistics Reduction of dimension Regularization Sciences and techniques of general use Signal noise Statistical discrepancies Statistics Theory and Methods Threshing Thresholding Variable selection
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if p(n)/n → 0. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ≫ n.
ISSN:	0162-1459 1537-274X
DOI:	10.1198/jasa.2009.0121