Loading…

A distributed-memory hierarchical solver for general sparse linear systems

•Derived a new formulation of a sequential hierarchical solver, which compresses dense fill-in blocks.•Proposed a new parallel algorithm for solving general sparse linear systems based on data decomposition.•Implemented a task-based asynchronous scheme by exploiting data dependency in our algorithm....

Full description

Saved in:

Bibliographic Details
Published in:	Parallel computing 2018-05, Vol.74 (C), p.49-64
Main Authors:	Chen, Chao, Pouransari, Hadi, Rajamanickam, Sivasankaran, Boman, Erik G., Darve, Eric
Format:	Article
Language:	English
Subjects:	Hierarchical matrix MATHEMATICS AND COMPUTING Parallel linear solver Sparse matrix
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•Derived a new formulation of a sequential hierarchical solver, which compresses dense fill-in blocks.•Proposed a new parallel algorithm for solving general sparse linear systems based on data decomposition.•Implemented a task-based asynchronous scheme by exploiting data dependency in our algorithm.•Implemented a coloring scheme to extract concurrency in the execution.•Provided benchmarks for various problems and analysis of parallel scalability under different conditions. We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm.
ISSN:	0167-8191 1872-7336
DOI:	10.1016/j.parco.2017.12.004