Loading…

Real-time GPU-based software beamformer designed for advanced imaging methods research

High computational demand is known to be a technical hurdle for real-time implementation of advanced methods like synthetic aperture imaging (SAI) and plane wave imaging (PWI) that work with the pre-beamform data of each array element. In this paper, we present the development of a software beamform...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yiu, Billy Y. S., Tsang, Ivan K. H., Yu, Alfred C. H.
Format:	Conference Proceeding
Language:	English
Subjects:	Array signal processing Graphics processing unit graphics processing units Imaging Instruction sets parallel processing Pixel plane wave imaging Real time systems software beamformer synthetic aperture imaging Ultrasonic imaging
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	High computational demand is known to be a technical hurdle for real-time implementation of advanced methods like synthetic aperture imaging (SAI) and plane wave imaging (PWI) that work with the pre-beamform data of each array element. In this paper, we present the development of a software beamformer for SAI and PWI with real-time parallel processing capacity. Our beamformer design comprises a pipelined group of graphics processing units (GPU) that are hosted within the same computer workstation. During operation, each available GPU is assigned to perform demodulation and beamforming for one frame of pre-beamform data acquired from one transmit firing (e.g. point firing for SAI). To facilitate parallel computation, the GPUs have been programmed to treat the calculation of depth pixels from the same image scanline as a block of processing threads that can be executed concurrently, and it would repeat this process for all scanlines to obtain the entire frame of image data - i.e. low-resolution image (LRI). To reduce processing latency due to repeated access of each GPU's global memory, we have made use of each thread block's fast-shared memory (to store an entire line of pre-beamform data during demodulation), created texture memory pointers, and utilized global memory caches (to stream repeatedly used data samples during beamforming). Based on this beamformer architecture, a prototype platform has been implemented for SAI and PWI, and its LRI processing throughput has been measured for test datasets with 40 MHz sampling rate, 32 receive channels, and imaging depths between 5-15 cm. When using two Fermi-class GPUs (GTX-470), our beamformer can compute LRIs of 512-by-255 pixels at over 3200 fps and 1300 fps respectively for imaging depths of 5 cm and 15 cm. This processing throughput is roughly 3.2 times higher than a Tesla-class GPU (GTX-275).
ISSN:	1051-0117
DOI:	10.1109/ULTSYM.2010.5935689