Loading…

Pipelined, Flexible Krylov Subspace Methods

We present variants of the Conjugate Gradient (CG), Conjugate Residual (CR), and Generalized Minimal Residual (GMRES) methods which are both pipelined and flexible. These allow computation of inner products and norms to be overlapped with operator and nonlinear or nondeterministic preconditioner app...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2016-09
Main Authors: Sanan, Patrick, Schnepp, Sascha M, May, Dave A
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present variants of the Conjugate Gradient (CG), Conjugate Residual (CR), and Generalized Minimal Residual (GMRES) methods which are both pipelined and flexible. These allow computation of inner products and norms to be overlapped with operator and nonlinear or nondeterministic preconditioner application.The methods are hence aimed at hiding network latencies and synchronizations which can become computational bottlenecks in Krylov methods on extreme-scale systems or in the strong-scaling limit. The new variants are not arithmetically equivalent to their base flexible Krylov methods, but are chosen to be similarly performant in a realistic use case, the application of strong nonlinear preconditioners to large problems which require many Krylov iterations. We provide scalable implementations of our methods as contributions to the PETSc package and demonstrate their effectiveness with practical examples derived from models of mantle convection and lithospheric dynamics with heterogeneous viscosity structure. These represent challenging problems where multiscale nonlinear preconditioners are required for the current state-of-the-art algorithms, and are hence amenable to acceleration with our new techniques. Large-scale tests are performed in the strong-scaling regime on a contemporary leadership supercomputer, where speedups approaching, and even exceeding \(2\times\) can be observed. We conclude by analyzing our new methods with a performance model targeted at future exascale machines.
ISSN:2331-8422
DOI:10.48550/arxiv.1511.07226