Loading…

Optimizing sparse matrix-vector multiplication on CUDA

In recent years, GPUs have attracted the attention of many application developers as powerful massively parallel system. CUDA as a general purpose parallel computing architecture make GPUs an appealing choice to solve many complex computational problems in a more efficient way. In this paper, we dis...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhuowei Wang, Xianbin Xu, Wuqing Zhao, Yuping Zhang, Shuibing He
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, GPUs have attracted the attention of many application developers as powerful massively parallel system. CUDA as a general purpose parallel computing architecture make GPUs an appealing choice to solve many complex computational problems in a more efficient way. In this paper, we discuss implementing optimizing spare matrix-vector multiplication on NVIDIA GPUs using CUDA programming model. We outline three optimizations include: (1) optimized CSR storage format, (2) optimized threads mapping, and (3) avoiding divergence judgment. We experimentally evaluate our optimizations on GeForce 9600 GTX, connect to Windows xp 64-bit system. In comparison with NVIDIA's SpMV library and NVIDIA's CUDDPA library, the results show that optimizing sparse matrix-vector multiplication on CUDA achieves better performance than other SpMV implementations.
ISSN:2155-1812
DOI:10.1109/ICETC.2010.5529724