Loading…

Communication-avoiding Cholesky-QR2 for rectangular matrices

Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show its effectiveness for a wide range of matrix sizes. Our algor...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2019-06
Main Authors:	Hutter, Edward, Solomonik, Edgar
Format:	Article
Language:	English
Subjects:	Algorithms Communication Computer memory Eigenvalues Interprocessor communication Microprocessors Nodes Parallel processing Scaling Supercomputers Synchronism
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show its effectiveness for a wide range of matrix sizes. Our algorithm executes over a 3D processor grid, the dimensions of which can be tuned to trade-off costs in synchronization, interprocessor communication, computational work, and memory footprint. We implement this algorithm, yielding a code that can achieve a factor of \(\Theta(P^{1/6})\) less interprocessor communication on \(P\) processors than any previous parallel QR implementation. Our performance study on Intel Knights-Landing and Cray XE supercomputers demonstrates the effectiveness of this CholeskyQR2 parallelization on a large number of nodes. Specifically, relative to ScaLAPACK's QR, on 1024 nodes of Stampede2, our CholeskyQR2 implementation is faster by 2.6x-3.3x in strong scaling tests and by 1.1x-1.9x in weak scaling tests.
ISSN:	2331-8422