Loading…

Scalable lattice Boltzmann solvers for CUDA GPU clusters

•An MPI-CUDA implementation of the lattice Boltzmann method is described.•We propose a method to handle efficiently 3D partitions of the simulation domain.•We study the performance of our implementation on a cluster using up to 24GPUs.•Peak performance as well as weak and strong scalability are sati...

Full description

Saved in:
Bibliographic Details
Published in:Parallel computing 2013-06, Vol.39 (6-7), p.259-270
Main Authors: Obrecht, Christian, Kuznik, Frédéric, Tourancheau, Bernard, Roux, Jean-Jacques
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c448t-91e98f33b8b06b1b9153b0d055c3d687126a0e9568fc77eb049d611f19fa97503
cites cdi_FETCH-LOGICAL-c448t-91e98f33b8b06b1b9153b0d055c3d687126a0e9568fc77eb049d611f19fa97503
container_end_page 270
container_issue 6-7
container_start_page 259
container_title Parallel computing
container_volume 39
creator Obrecht, Christian
Kuznik, Frédéric
Tourancheau, Bernard
Roux, Jean-Jacques
description •An MPI-CUDA implementation of the lattice Boltzmann method is described.•We propose a method to handle efficiently 3D partitions of the simulation domain.•We study the performance of our implementation on a cluster using up to 24GPUs.•Peak performance as well as weak and strong scalability are satisfactory. The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implementations. Yet, to be of practical interest, GPU LBM solvers need to be able to perform large scale simulations. In the present contribution, we describe an efficient LBM implementation for CUDA GPU clusters. Our solver consists of a set of MPI communication routines and a CUDA kernel specifically designed to handle three-dimensional partitioning of the computation domain. Performance measurement were carried out on a small cluster. We show that the results are satisfying, both in terms of data throughput and parallelisation efficiency.
doi_str_mv 10.1016/j.parco.2013.04.001
format article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_00931058v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819113000458</els_id><sourcerecordid>1671579238</sourcerecordid><originalsourceid>FETCH-LOGICAL-c448t-91e98f33b8b06b1b9153b0d055c3d687126a0e9568fc77eb049d611f19fa97503</originalsourceid><addsrcrecordid>eNqFkE1Lw0AURQdRsFZ_gZssdZH4XiaZj4WLWrUKBQXtephMJpgy7dSZtKC_3sSKS109uJz7uBxCzhEyBGRXy2yjg_FZDkgzKDIAPCAjFDxPOaXskIx6iqcCJR6TkxiXAMAKASMiXox2unI2cbrrWmOTG--6z5Ver5Po3c6GmDQ-JNPF7SSZPS8S47ax69NTctRoF-3Zzx2Txf3d6_QhnT_NHqeTeWqKQnSpRCtFQ2klKmAVVhJLWkENZWlozQTHnGmwsmSiMZzbCgpZM8QGZaMlL4GOyeX-75t2ahPalQ4fyutWPUzmasgAJEUoxQ579mLPboJ_39rYqVUbjXVOr63fRtU7wJLLnIr_UcpzVoDIWY_SPWqCjzHY5ncGghr0q6X61q8G_QqKftKw5Xrfsr2cXWuDiqa1a2PrNljTqdq3f_a_AEuui1M</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1372640826</pqid></control><display><type>article</type><title>Scalable lattice Boltzmann solvers for CUDA GPU clusters</title><source>ScienceDirect Freedom Collection</source><creator>Obrecht, Christian ; Kuznik, Frédéric ; Tourancheau, Bernard ; Roux, Jean-Jacques</creator><creatorcontrib>Obrecht, Christian ; Kuznik, Frédéric ; Tourancheau, Bernard ; Roux, Jean-Jacques</creatorcontrib><description>•An MPI-CUDA implementation of the lattice Boltzmann method is described.•We propose a method to handle efficiently 3D partitions of the simulation domain.•We study the performance of our implementation on a cluster using up to 24GPUs.•Peak performance as well as weak and strong scalability are satisfactory. The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implementations. Yet, to be of practical interest, GPU LBM solvers need to be able to perform large scale simulations. In the present contribution, we describe an efficient LBM implementation for CUDA GPU clusters. Our solver consists of a set of MPI communication routines and a CUDA kernel specifically designed to handle three-dimensional partitioning of the computation domain. Performance measurement were carried out on a small cluster. We show that the results are satisfying, both in terms of data throughput and parallelisation efficiency.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/j.parco.2013.04.001</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Clusters ; Computation ; Computational fluid dynamics ; Computer Science ; CUDA ; GPU clusters ; Kernels ; Lattice Boltzmann method ; Lattices ; Networking and Internet Architecture ; Performance measurement ; Solvers ; Three dimensional</subject><ispartof>Parallel computing, 2013-06, Vol.39 (6-7), p.259-270</ispartof><rights>2013 Elsevier B.V.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c448t-91e98f33b8b06b1b9153b0d055c3d687126a0e9568fc77eb049d611f19fa97503</citedby><cites>FETCH-LOGICAL-c448t-91e98f33b8b06b1b9153b0d055c3d687126a0e9568fc77eb049d611f19fa97503</cites><orcidid>0000-0003-3156-1874 ; 0000-0002-6892-9057 ; 0000-0001-5724-1823 ; 0000-0001-6502-9689</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,27903,27904</link.rule.ids><backlink>$$Uhttps://hal.science/hal-00931058$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Obrecht, Christian</creatorcontrib><creatorcontrib>Kuznik, Frédéric</creatorcontrib><creatorcontrib>Tourancheau, Bernard</creatorcontrib><creatorcontrib>Roux, Jean-Jacques</creatorcontrib><title>Scalable lattice Boltzmann solvers for CUDA GPU clusters</title><title>Parallel computing</title><description>•An MPI-CUDA implementation of the lattice Boltzmann method is described.•We propose a method to handle efficiently 3D partitions of the simulation domain.•We study the performance of our implementation on a cluster using up to 24GPUs.•Peak performance as well as weak and strong scalability are satisfactory. The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implementations. Yet, to be of practical interest, GPU LBM solvers need to be able to perform large scale simulations. In the present contribution, we describe an efficient LBM implementation for CUDA GPU clusters. Our solver consists of a set of MPI communication routines and a CUDA kernel specifically designed to handle three-dimensional partitioning of the computation domain. Performance measurement were carried out on a small cluster. We show that the results are satisfying, both in terms of data throughput and parallelisation efficiency.</description><subject>Clusters</subject><subject>Computation</subject><subject>Computational fluid dynamics</subject><subject>Computer Science</subject><subject>CUDA</subject><subject>GPU clusters</subject><subject>Kernels</subject><subject>Lattice Boltzmann method</subject><subject>Lattices</subject><subject>Networking and Internet Architecture</subject><subject>Performance measurement</subject><subject>Solvers</subject><subject>Three dimensional</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqFkE1Lw0AURQdRsFZ_gZssdZH4XiaZj4WLWrUKBQXtephMJpgy7dSZtKC_3sSKS109uJz7uBxCzhEyBGRXy2yjg_FZDkgzKDIAPCAjFDxPOaXskIx6iqcCJR6TkxiXAMAKASMiXox2unI2cbrrWmOTG--6z5Ver5Po3c6GmDQ-JNPF7SSZPS8S47ax69NTctRoF-3Zzx2Txf3d6_QhnT_NHqeTeWqKQnSpRCtFQ2klKmAVVhJLWkENZWlozQTHnGmwsmSiMZzbCgpZM8QGZaMlL4GOyeX-75t2ahPalQ4fyutWPUzmasgAJEUoxQ579mLPboJ_39rYqVUbjXVOr63fRtU7wJLLnIr_UcpzVoDIWY_SPWqCjzHY5ncGghr0q6X61q8G_QqKftKw5Xrfsr2cXWuDiqa1a2PrNljTqdq3f_a_AEuui1M</recordid><startdate>20130601</startdate><enddate>20130601</enddate><creator>Obrecht, Christian</creator><creator>Kuznik, Frédéric</creator><creator>Tourancheau, Bernard</creator><creator>Roux, Jean-Jacques</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0003-3156-1874</orcidid><orcidid>https://orcid.org/0000-0002-6892-9057</orcidid><orcidid>https://orcid.org/0000-0001-5724-1823</orcidid><orcidid>https://orcid.org/0000-0001-6502-9689</orcidid></search><sort><creationdate>20130601</creationdate><title>Scalable lattice Boltzmann solvers for CUDA GPU clusters</title><author>Obrecht, Christian ; Kuznik, Frédéric ; Tourancheau, Bernard ; Roux, Jean-Jacques</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c448t-91e98f33b8b06b1b9153b0d055c3d687126a0e9568fc77eb049d611f19fa97503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Clusters</topic><topic>Computation</topic><topic>Computational fluid dynamics</topic><topic>Computer Science</topic><topic>CUDA</topic><topic>GPU clusters</topic><topic>Kernels</topic><topic>Lattice Boltzmann method</topic><topic>Lattices</topic><topic>Networking and Internet Architecture</topic><topic>Performance measurement</topic><topic>Solvers</topic><topic>Three dimensional</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Obrecht, Christian</creatorcontrib><creatorcontrib>Kuznik, Frédéric</creatorcontrib><creatorcontrib>Tourancheau, Bernard</creatorcontrib><creatorcontrib>Roux, Jean-Jacques</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Obrecht, Christian</au><au>Kuznik, Frédéric</au><au>Tourancheau, Bernard</au><au>Roux, Jean-Jacques</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable lattice Boltzmann solvers for CUDA GPU clusters</atitle><jtitle>Parallel computing</jtitle><date>2013-06-01</date><risdate>2013</risdate><volume>39</volume><issue>6-7</issue><spage>259</spage><epage>270</epage><pages>259-270</pages><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>•An MPI-CUDA implementation of the lattice Boltzmann method is described.•We propose a method to handle efficiently 3D partitions of the simulation domain.•We study the performance of our implementation on a cluster using up to 24GPUs.•Peak performance as well as weak and strong scalability are satisfactory. The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implementations. Yet, to be of practical interest, GPU LBM solvers need to be able to perform large scale simulations. In the present contribution, we describe an efficient LBM implementation for CUDA GPU clusters. Our solver consists of a set of MPI communication routines and a CUDA kernel specifically designed to handle three-dimensional partitioning of the computation domain. Performance measurement were carried out on a small cluster. We show that the results are satisfying, both in terms of data throughput and parallelisation efficiency.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.parco.2013.04.001</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-3156-1874</orcidid><orcidid>https://orcid.org/0000-0002-6892-9057</orcidid><orcidid>https://orcid.org/0000-0001-5724-1823</orcidid><orcidid>https://orcid.org/0000-0001-6502-9689</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0167-8191
ispartof Parallel computing, 2013-06, Vol.39 (6-7), p.259-270
issn 0167-8191
1872-7336
language eng
recordid cdi_hal_primary_oai_HAL_hal_00931058v1
source ScienceDirect Freedom Collection
subjects Clusters
Computation
Computational fluid dynamics
Computer Science
CUDA
GPU clusters
Kernels
Lattice Boltzmann method
Lattices
Networking and Internet Architecture
Performance measurement
Solvers
Three dimensional
title Scalable lattice Boltzmann solvers for CUDA GPU clusters
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T02%3A03%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20lattice%20Boltzmann%20solvers%20for%20CUDA%20GPU%20clusters&rft.jtitle=Parallel%20computing&rft.au=Obrecht,%20Christian&rft.date=2013-06-01&rft.volume=39&rft.issue=6-7&rft.spage=259&rft.epage=270&rft.pages=259-270&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/j.parco.2013.04.001&rft_dat=%3Cproquest_hal_p%3E1671579238%3C/proquest_hal_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c448t-91e98f33b8b06b1b9153b0d055c3d687126a0e9568fc77eb049d611f19fa97503%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1372640826&rft_id=info:pmid/&rfr_iscdi=true