Loading…

Improving 3D lattice boltzmann method stencil with asynchronous transfers on many-core processors

CPU-based many-core processors present an alternative to multicore CPU and GPU processors. In particular, the 93-Petaflops Sunway supercomputer, built from clustered many-core processors, has opened a new era for high performance computing that does not rely on GPU acceleration. However, memory band...

Full description

Saved in:
Bibliographic Details
Main Authors: Minh Quan Ho, Obrecht, Christian, Tourancheau, Bernard, de Dinechin, Benoit Dupont, Hascoet, Julien
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:CPU-based many-core processors present an alternative to multicore CPU and GPU processors. In particular, the 93-Petaflops Sunway supercomputer, built from clustered many-core processors, has opened a new era for high performance computing that does not rely on GPU acceleration. However, memory bandwidth remains the main challenge for these architectures. This motivates our endeavor for optimizing one of the most data-intensive kind of stencil computations, namely the three-dimensional applications of the lattice Boltzmann method (LBM). We propose optimizations on many-cores processors by using local memory and asynchronous software-prefetching on a representative 3D LBM solver as an example. We achieve 33 % performance gain on the Kalray MPPA-256 many-core processor by actively streaming data from/to local memory, compared to the "passive" OpenCL programming model.
ISSN:2374-9628
DOI:10.1109/PCCC.2017.8280472