Loading…

TTLG - An Efficient Tensor Transposition Library for GPUs

This paper presents a Tensor Transposition Library for GPUs (TTLG). A distinguishing feature of TTLG is that it also includes a performance prediction model, which can be used by higher level optimizers that use tensor transposition. For example, tensor contractions are often implemented by using th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Vedurada, Jyothi, Suresh, Arjun, Rajam, Aravind Sukumaran, Kim, Jinsung, Hong, Changwan, Panyala, Ajay, Krishnamoorthy, Sriram, Nandivada, V. Krishna, Srivastava, Rohit Kumar, Sadayappan, P.
Format:	Conference Proceeding
Language:	English
Subjects:	GPU Graphics processing units High performance Indexes Instruction sets Libraries Taxonomy Tensile stress Tensor Transpose Two dimensional displays
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper presents a Tensor Transposition Library for GPUs (TTLG). A distinguishing feature of TTLG is that it also includes a performance prediction model, which can be used by higher level optimizers that use tensor transposition. For example, tensor contractions are often implemented by using the TTGT (Transpose-Transpose-GEMM-Transpose) approach - transpose input tensors to a suitable layout and then use high-performance matrix multiplication followed by transposition of the result. The performance model is also used internally by TTLG for choosing among alternative kernels and/or slicing/blocking parameters for the transposition. TTLG is compared with current state-of-the-art alternatives for GPUs. Comparable or better transposition times for the "repeated-use" scenario and considerably better "single-use" performance are observed.
ISSN:	1530-2075
DOI:	10.1109/IPDPS.2018.00067