Loading…
Batched matrix computations on hardware accelerators based on GPUs
Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-effic...
Saved in:
Published in: | The international journal of high performance computing applications 2015-06, Vol.29 (2), p.193 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that the authors call batched factorizations. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multicore CPUs on important scientific workloads. This paper, consequently, describes the development of the most common, one-sided factorizations, Cholesky, LU, and QR, for a set of small dense matrices. The algorithms the authors present together with their implementations are, by design, inherently parallel. Their approach is more efficient than what works for a combination of multicore CPUs and GPUs for the problems sizes of interest of the application use cases. |
---|---|
ISSN: | 1094-3420 1741-2846 |
DOI: | 10.1177/1094342014567546 |