Loading…

AutoSparse: A Source-to-Source Format and Schedule Auto- Tuning Framework for Sparse Tensor Program

Sparse tensor computation plays a crucial role in modern deep learning workloads, and its expensive computational cost leads to a strong demand for high-performance oper-ators. However, developing high-performance sparse operators is exceptionally challenging and tedious. Existing vendor operator li...

Full description

Saved in:
Bibliographic Details
Main Authors: Qu, Xiangjun, Gong, Lei, Lou, Wenqi, Cheng, Qianyu, Chen, Xianglan, Wang, Chao, Zhou, Xuehai
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sparse tensor computation plays a crucial role in modern deep learning workloads, and its expensive computational cost leads to a strong demand for high-performance oper-ators. However, developing high-performance sparse operators is exceptionally challenging and tedious. Existing vendor operator libraries fail to keep pace with the evolving trends in new algorithms. Sparse tensor compilers simplify the development and optimization of operator, but existing work either requires significant engineering effort for tuning or suffers from limitations in search space and search strategies, which creates unavoidable cost and efficiency issues. In this paper, we propose AutoSparse, a source-to-source auto-tuning framework that targets sparse for-mat and schedule for sparse tensor program. Firstly, AutoSparse designs a sparse tensor DSL based on dynamic computational graph at the front-end, and proposes a sparse tensor program computational pattern extraction and automatic design space generation scheme based on it. Second, AutoSparse's back-end designs an adaptive exploration strategy based on reinforcement learning and heuristic algorithm to find the optimal format and schedule configuration in a large-scale design space. Compared to prior work, developers using AutoSparse do not need to specify tuning design space relied on any compilation or hardware knowledge. We use the SuiteS parse dataset to compare with four state-of-the-art baselines, namely, the high-performance operator library MKL, the manually-based optimisation scheme ASpT, the auto-tuning-based framework TVM-S and WACO. The results demonstrate that AutoSparse achieves average speedups of 1.92- 2.48 \times. 1.19-6.34 \times . and 1.47-2.23\times for the SpMV, SpMM, and SDDMM operators, respectively. We will open-source AutoSparse at https://github.com/Qu-Xiangjun/AutoSparse.
ISSN:2576-6996
DOI:10.1109/ICCD63220.2024.00083