Loading…

A high throughput LDPC decoder using a mid-range GPU

A standard-throughput-approaching LDPC decoder has been implemented on a mid-range GPU in this paper. Turbo-Decoding Message-Passing algorithm is applied to achieve high throughput. Different from traditional host managed multi-streams to hide host-device transfer delay, we use kernel maintained dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Xie Wen, Jiao Xianjun, Jaaskelainen, Pekka, Kultala, Heikki, Chen Canfeng, Berg, Heikki, Bie Zhisong
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A standard-throughput-approaching LDPC decoder has been implemented on a mid-range GPU in this paper. Turbo-Decoding Message-Passing algorithm is applied to achieve high throughput. Different from traditional host managed multi-streams to hide host-device transfer delay, we use kernel maintained data transfer scheme to achieve implicit data transfer between host memory and device shared memory, which eliminates an intermediate stage of global memory. Data type optimization, memory accessing optimization, and low complexity Soft-In Soft-Out algorithm are also used to improve efficiency. Through these optimization methods, the 802.11n LDPC decoder on NVIDIA GTX480 GPU, which is released in 2010 with Fermi architecture, has achieved a high throughput of 295Mb/s when decoding 512 codewords simultaneously, which is close to highest bit rate 300Mb/s with 20MHz bandwidth in 802.11n standard. Decoding 1024 and 4096 codewords achieve 330 and 365Mb/s. A 802.16e LDPC decoder is also implemented, 374Mb/s (512 codewords), 435Mb/s (1024 codewords) and 507Mb/s (4096 codewords) throughputs have been achieved.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2014.6855061