Loading…

Multi-core-CPU and GPU-accelerated radiative transfer models based on the discrete ordinate method

The operational processing of remote sensing data usually requires high-performance radiative transfer model (RTM) simulations. To date, multi-core CPUs and also Graphical Processing Units (GPUs) have been used for highly intensive parallel computations. In this paper, we have compared multi-core an...

Full description

Saved in:
Bibliographic Details
Published in:Computer physics communications 2014-12, Vol.185 (12), p.3079-3089
Main Authors: Efremenko, Dmitry S., Loyola, Diego G., Doicu, Adrian, Spurr, Robert J.D.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c470t-a37a4e93a72d4ff864a4adbb892fdd603c636c67074f21410855a99ea7521d1e3
cites cdi_FETCH-LOGICAL-c470t-a37a4e93a72d4ff864a4adbb892fdd603c636c67074f21410855a99ea7521d1e3
container_end_page 3089
container_issue 12
container_start_page 3079
container_title Computer physics communications
container_volume 185
creator Efremenko, Dmitry S.
Loyola, Diego G.
Doicu, Adrian
Spurr, Robert J.D.
description The operational processing of remote sensing data usually requires high-performance radiative transfer model (RTM) simulations. To date, multi-core CPUs and also Graphical Processing Units (GPUs) have been used for highly intensive parallel computations. In this paper, we have compared multi-core and GPU implementations of an RTM based on the discrete ordinate solution method. To implement GPUs, the original CPU code has been redesigned using the C-oriented Compute Unified Device Architecture (CUDA) developed by NVIDIA. GPU memory management is a crucial issue regarding the performance. To cope with limitations of GPU registers, we have adapted an RTM based on the matrix operator technique together with the interaction principle for multilayer atmospheric systems. The speed-up of such an implementation depends on the number of discrete ordinates used in the RTM. To reduce the CPU/GPU communication overhead, we have exploited the asynchronous data transfer between host and device. To obtain optimal performance, we have also used overlapping of CPU and GPU computations by distributing the workload between them. With GPUs, we have achieved a 20x–40x speed-up for the multi-stream RTM, and 50x speed-up for the two-stream RTM with respect to the original single-threaded CPU codes. Based on these performance tests, an optimal workload distribution scheme between GPU and CPU is proposed. Additionally, CPU/GPU benchmark tests regarding basic matrix operations are given. Finally, we discuss the performance obtained with the multi-core-CPU and GPU implementations of the RTM.
doi_str_mv 10.1016/j.cpc.2014.07.018
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1709764186</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0010465514002616</els_id><sourcerecordid>1709764186</sourcerecordid><originalsourceid>FETCH-LOGICAL-c470t-a37a4e93a72d4ff864a4adbb892fdd603c636c67074f21410855a99ea7521d1e3</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EEuXjB7B5ZEk4J46diAlVUJBAdKCzdbUvqqs0LraLxL8nqMxMN9z7vLp7GLsRUAoQ6m5b2r0tKxCyBF2CaE_YTLS6K6pOylM2AxBQSNU05-wipS0AaN3VM7Z-OwzZFzZEKubLFcfR8cVyVaC1NFDETI5HdB6z_yKeI46pp8h3wdGQ-BrTtA8jzxvizicbKRMP0flxIvmO8ia4K3bW45Do-m9estXT48f8uXh9X7zMH14LKzXkAmuNkroadeVk37dKokS3Xrdd1TunoLaqVlZp0LKvhBTQNg12HaFuKuEE1Zfs9ti7j-HzQCmb3XQRDQOOFA7JCA2dVlK0aoqKY9TGkFKk3uyj32H8NgLMr0-zNZNP8-vTgDaTz4m5PzLT4_TlKZpkPY2WnI9ks3HB_0P_ANwIfcY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1709764186</pqid></control><display><type>article</type><title>Multi-core-CPU and GPU-accelerated radiative transfer models based on the discrete ordinate method</title><source>ScienceDirect Freedom Collection</source><creator>Efremenko, Dmitry S. ; Loyola, Diego G. ; Doicu, Adrian ; Spurr, Robert J.D.</creator><creatorcontrib>Efremenko, Dmitry S. ; Loyola, Diego G. ; Doicu, Adrian ; Spurr, Robert J.D.</creatorcontrib><description>The operational processing of remote sensing data usually requires high-performance radiative transfer model (RTM) simulations. To date, multi-core CPUs and also Graphical Processing Units (GPUs) have been used for highly intensive parallel computations. In this paper, we have compared multi-core and GPU implementations of an RTM based on the discrete ordinate solution method. To implement GPUs, the original CPU code has been redesigned using the C-oriented Compute Unified Device Architecture (CUDA) developed by NVIDIA. GPU memory management is a crucial issue regarding the performance. To cope with limitations of GPU registers, we have adapted an RTM based on the matrix operator technique together with the interaction principle for multilayer atmospheric systems. The speed-up of such an implementation depends on the number of discrete ordinates used in the RTM. To reduce the CPU/GPU communication overhead, we have exploited the asynchronous data transfer between host and device. To obtain optimal performance, we have also used overlapping of CPU and GPU computations by distributing the workload between them. With GPUs, we have achieved a 20x–40x speed-up for the multi-stream RTM, and 50x speed-up for the two-stream RTM with respect to the original single-threaded CPU codes. Based on these performance tests, an optimal workload distribution scheme between GPU and CPU is proposed. Additionally, CPU/GPU benchmark tests regarding basic matrix operations are given. Finally, we discuss the performance obtained with the multi-core-CPU and GPU implementations of the RTM.</description><identifier>ISSN: 0010-4655</identifier><identifier>EISSN: 1879-2944</identifier><identifier>DOI: 10.1016/j.cpc.2014.07.018</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Central processing units ; Computation ; Computer simulation ; CUDA ; Devices ; Discrete ordinate method ; Mathematical models ; Optimization ; Radiative transfer ; Radiative transfer models ; Workload</subject><ispartof>Computer physics communications, 2014-12, Vol.185 (12), p.3079-3089</ispartof><rights>2014 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c470t-a37a4e93a72d4ff864a4adbb892fdd603c636c67074f21410855a99ea7521d1e3</citedby><cites>FETCH-LOGICAL-c470t-a37a4e93a72d4ff864a4adbb892fdd603c636c67074f21410855a99ea7521d1e3</cites><orcidid>0000-0002-7449-5072</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Efremenko, Dmitry S.</creatorcontrib><creatorcontrib>Loyola, Diego G.</creatorcontrib><creatorcontrib>Doicu, Adrian</creatorcontrib><creatorcontrib>Spurr, Robert J.D.</creatorcontrib><title>Multi-core-CPU and GPU-accelerated radiative transfer models based on the discrete ordinate method</title><title>Computer physics communications</title><description>The operational processing of remote sensing data usually requires high-performance radiative transfer model (RTM) simulations. To date, multi-core CPUs and also Graphical Processing Units (GPUs) have been used for highly intensive parallel computations. In this paper, we have compared multi-core and GPU implementations of an RTM based on the discrete ordinate solution method. To implement GPUs, the original CPU code has been redesigned using the C-oriented Compute Unified Device Architecture (CUDA) developed by NVIDIA. GPU memory management is a crucial issue regarding the performance. To cope with limitations of GPU registers, we have adapted an RTM based on the matrix operator technique together with the interaction principle for multilayer atmospheric systems. The speed-up of such an implementation depends on the number of discrete ordinates used in the RTM. To reduce the CPU/GPU communication overhead, we have exploited the asynchronous data transfer between host and device. To obtain optimal performance, we have also used overlapping of CPU and GPU computations by distributing the workload between them. With GPUs, we have achieved a 20x–40x speed-up for the multi-stream RTM, and 50x speed-up for the two-stream RTM with respect to the original single-threaded CPU codes. Based on these performance tests, an optimal workload distribution scheme between GPU and CPU is proposed. Additionally, CPU/GPU benchmark tests regarding basic matrix operations are given. Finally, we discuss the performance obtained with the multi-core-CPU and GPU implementations of the RTM.</description><subject>Central processing units</subject><subject>Computation</subject><subject>Computer simulation</subject><subject>CUDA</subject><subject>Devices</subject><subject>Discrete ordinate method</subject><subject>Mathematical models</subject><subject>Optimization</subject><subject>Radiative transfer</subject><subject>Radiative transfer models</subject><subject>Workload</subject><issn>0010-4655</issn><issn>1879-2944</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kD1PwzAQhi0EEuXjB7B5ZEk4J46diAlVUJBAdKCzdbUvqqs0LraLxL8nqMxMN9z7vLp7GLsRUAoQ6m5b2r0tKxCyBF2CaE_YTLS6K6pOylM2AxBQSNU05-wipS0AaN3VM7Z-OwzZFzZEKubLFcfR8cVyVaC1NFDETI5HdB6z_yKeI46pp8h3wdGQ-BrTtA8jzxvizicbKRMP0flxIvmO8ia4K3bW45Do-m9estXT48f8uXh9X7zMH14LKzXkAmuNkroadeVk37dKokS3Xrdd1TunoLaqVlZp0LKvhBTQNg12HaFuKuEE1Zfs9ti7j-HzQCmb3XQRDQOOFA7JCA2dVlK0aoqKY9TGkFKk3uyj32H8NgLMr0-zNZNP8-vTgDaTz4m5PzLT4_TlKZpkPY2WnI9ks3HB_0P_ANwIfcY</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Efremenko, Dmitry S.</creator><creator>Loyola, Diego G.</creator><creator>Doicu, Adrian</creator><creator>Spurr, Robert J.D.</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-7449-5072</orcidid></search><sort><creationdate>20141201</creationdate><title>Multi-core-CPU and GPU-accelerated radiative transfer models based on the discrete ordinate method</title><author>Efremenko, Dmitry S. ; Loyola, Diego G. ; Doicu, Adrian ; Spurr, Robert J.D.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c470t-a37a4e93a72d4ff864a4adbb892fdd603c636c67074f21410855a99ea7521d1e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Central processing units</topic><topic>Computation</topic><topic>Computer simulation</topic><topic>CUDA</topic><topic>Devices</topic><topic>Discrete ordinate method</topic><topic>Mathematical models</topic><topic>Optimization</topic><topic>Radiative transfer</topic><topic>Radiative transfer models</topic><topic>Workload</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Efremenko, Dmitry S.</creatorcontrib><creatorcontrib>Loyola, Diego G.</creatorcontrib><creatorcontrib>Doicu, Adrian</creatorcontrib><creatorcontrib>Spurr, Robert J.D.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer physics communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Efremenko, Dmitry S.</au><au>Loyola, Diego G.</au><au>Doicu, Adrian</au><au>Spurr, Robert J.D.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-core-CPU and GPU-accelerated radiative transfer models based on the discrete ordinate method</atitle><jtitle>Computer physics communications</jtitle><date>2014-12-01</date><risdate>2014</risdate><volume>185</volume><issue>12</issue><spage>3079</spage><epage>3089</epage><pages>3079-3089</pages><issn>0010-4655</issn><eissn>1879-2944</eissn><abstract>The operational processing of remote sensing data usually requires high-performance radiative transfer model (RTM) simulations. To date, multi-core CPUs and also Graphical Processing Units (GPUs) have been used for highly intensive parallel computations. In this paper, we have compared multi-core and GPU implementations of an RTM based on the discrete ordinate solution method. To implement GPUs, the original CPU code has been redesigned using the C-oriented Compute Unified Device Architecture (CUDA) developed by NVIDIA. GPU memory management is a crucial issue regarding the performance. To cope with limitations of GPU registers, we have adapted an RTM based on the matrix operator technique together with the interaction principle for multilayer atmospheric systems. The speed-up of such an implementation depends on the number of discrete ordinates used in the RTM. To reduce the CPU/GPU communication overhead, we have exploited the asynchronous data transfer between host and device. To obtain optimal performance, we have also used overlapping of CPU and GPU computations by distributing the workload between them. With GPUs, we have achieved a 20x–40x speed-up for the multi-stream RTM, and 50x speed-up for the two-stream RTM with respect to the original single-threaded CPU codes. Based on these performance tests, an optimal workload distribution scheme between GPU and CPU is proposed. Additionally, CPU/GPU benchmark tests regarding basic matrix operations are given. Finally, we discuss the performance obtained with the multi-core-CPU and GPU implementations of the RTM.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.cpc.2014.07.018</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-7449-5072</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0010-4655
ispartof Computer physics communications, 2014-12, Vol.185 (12), p.3079-3089
issn 0010-4655
1879-2944
language eng
recordid cdi_proquest_miscellaneous_1709764186
source ScienceDirect Freedom Collection
subjects Central processing units
Computation
Computer simulation
CUDA
Devices
Discrete ordinate method
Mathematical models
Optimization
Radiative transfer
Radiative transfer models
Workload
title Multi-core-CPU and GPU-accelerated radiative transfer models based on the discrete ordinate method
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T07%3A21%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-core-CPU%20and%20GPU-accelerated%20radiative%20transfer%20models%20based%20on%20the%20discrete%20ordinate%20method&rft.jtitle=Computer%20physics%20communications&rft.au=Efremenko,%20Dmitry%20S.&rft.date=2014-12-01&rft.volume=185&rft.issue=12&rft.spage=3079&rft.epage=3089&rft.pages=3079-3089&rft.issn=0010-4655&rft.eissn=1879-2944&rft_id=info:doi/10.1016/j.cpc.2014.07.018&rft_dat=%3Cproquest_cross%3E1709764186%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c470t-a37a4e93a72d4ff864a4adbb892fdd603c636c67074f21410855a99ea7521d1e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1709764186&rft_id=info:pmid/&rfr_iscdi=true