Loading…

Optimizing sparse matrix-vector multiplication on CUDA

In recent years, GPUs have attracted the attention of many application developers as powerful massively parallel system. CUDA as a general purpose parallel computing architecture make GPUs an appealing choice to solve many complex computational problems in a more efficient way. In this paper, we dis...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhuowei Wang, Xianbin Xu, Wuqing Zhao, Yuping Zhang, Shuibing He
Format:	Conference Proceeding
Language:	English
Subjects:	Computer science education Concurrent computing CUDA Educational institutions GPUs Graphics processing unit Kernel Libraries NVIDIA's CUDDPA library NVIDIA's SpMV library Parallel processing Parallel programming Sparse matrices SpMV
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In recent years, GPUs have attracted the attention of many application developers as powerful massively parallel system. CUDA as a general purpose parallel computing architecture make GPUs an appealing choice to solve many complex computational problems in a more efficient way. In this paper, we discuss implementing optimizing spare matrix-vector multiplication on NVIDIA GPUs using CUDA programming model. We outline three optimizations include: (1) optimized CSR storage format, (2) optimized threads mapping, and (3) avoiding divergence judgment. We experimentally evaluate our optimizations on GeForce 9600 GTX, connect to Windows xp 64-bit system. In comparison with NVIDIA's SpMV library and NVIDIA's CUDDPA library, the results show that optimizing sparse matrix-vector multiplication on CUDA achieves better performance than other SpMV implementations.
ISSN:	2155-1812
DOI:	10.1109/ICETC.2010.5529724