Loading…

Batched matrix computations on hardware accelerators based on GPUs

Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-effic...

Full description

Saved in:
Bibliographic Details
Published in:The international journal of high performance computing applications 2015-06, Vol.29 (2), p.193
Main Authors: Haidar, Azzam, Dong, Tingxing, Luszczek, Piotr, Tomov, Stanimire, Dongarra, Jack
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that the authors call batched factorizations. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multicore CPUs on important scientific workloads. This paper, consequently, describes the development of the most common, one-sided factorizations, Cholesky, LU, and QR, for a set of small dense matrices. The algorithms the authors present together with their implementations are, by design, inherently parallel. Their approach is more efficient than what works for a combination of multicore CPUs and GPUs for the problems sizes of interest of the application use cases.
ISSN:1094-3420
1741-2846
DOI:10.1177/1094342014567546