Loading…

DGSF: Disaggregated GPUs for Serverless Functions

Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fingler, Henrique, Zhu, Zhiting, Yoon, Esther, Jia, Zhipeng, Witchel, Emmett, Rossbach, Christopher J.
Format:	Conference Proceeding
Language:	English
Subjects:	API remoting Cloud computing Distributed processing FaaS GPU Graphics processing units Machine learning Production Prototypes Runtime serverless
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	750
container_issue
container_start_page	739
container_title
container_volume
creator	Fingler, Henrique Zhu, Zhiting Yoon, Esther Jia, Zhipeng Witchel, Emmett Rossbach, Christopher J.
description	Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF's API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.
doi_str_mv	10.1109/IPDPS53621.2022.00077
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9820659</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9820659</ieee_id><sourcerecordid>9820659</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-7e4b300d7d9db761aca894983157577e2019bf074575eaa0d46acd633dd44f963</originalsourceid><addsrcrecordid>eNotzM1Kw0AUQOFREKy1TyBCXiDxzu_NuJPGxELBQOy6TDI3YaS2MhMF396Crg5n8zF2z6HgHOzDpq3aTksjeCFAiAIAEC_YymLJjdGq5GDsJVtwLSEXgPqa3aT0DiBAKrtgvGq6-jGrQnLTFGlyM_msaXcpG08x6yh-UzxQSln9dRzmcDqmW3Y1ukOi1X-XbFc_v61f8u1rs1k_bfNwpuccSfUSwKO3vkfD3eBKq2wpuUaNSAK47UdAdV5yDrwybvBGSu-VGq2RS3b35wYi2n_G8OHiz96WAoy28hclVEOs</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>DGSF: Disaggregated GPUs for Serverless Functions</title><source>IEEE Xplore All Conference Series</source><creator>Fingler, Henrique ; Zhu, Zhiting ; Yoon, Esther ; Jia, Zhipeng ; Witchel, Emmett ; Rossbach, Christopher J.</creator><creatorcontrib>Fingler, Henrique ; Zhu, Zhiting ; Yoon, Esther ; Jia, Zhipeng ; Witchel, Emmett ; Rossbach, Christopher J.</creatorcontrib><description>Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF's API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.</description><identifier>EISSN: 1530-2075</identifier><identifier>EISBN: 9781665481069</identifier><identifier>EISBN: 1665481064</identifier><identifier>DOI: 10.1109/IPDPS53621.2022.00077</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>API remoting ; Cloud computing ; Distributed processing ; FaaS ; GPU ; Graphics processing units ; Machine learning ; Production ; Prototypes ; Runtime ; serverless</subject><ispartof>2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022, p.739-750</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9820659$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9820659$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fingler, Henrique</creatorcontrib><creatorcontrib>Zhu, Zhiting</creatorcontrib><creatorcontrib>Yoon, Esther</creatorcontrib><creatorcontrib>Jia, Zhipeng</creatorcontrib><creatorcontrib>Witchel, Emmett</creatorcontrib><creatorcontrib>Rossbach, Christopher J.</creatorcontrib><title>DGSF: Disaggregated GPUs for Serverless Functions</title><title>2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</title><addtitle>IPDPS</addtitle><description>Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF's API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.</description><subject>API remoting</subject><subject>Cloud computing</subject><subject>Distributed processing</subject><subject>FaaS</subject><subject>GPU</subject><subject>Graphics processing units</subject><subject>Machine learning</subject><subject>Production</subject><subject>Prototypes</subject><subject>Runtime</subject><subject>serverless</subject><issn>1530-2075</issn><isbn>9781665481069</isbn><isbn>1665481064</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2022</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotzM1Kw0AUQOFREKy1TyBCXiDxzu_NuJPGxELBQOy6TDI3YaS2MhMF396Crg5n8zF2z6HgHOzDpq3aTksjeCFAiAIAEC_YymLJjdGq5GDsJVtwLSEXgPqa3aT0DiBAKrtgvGq6-jGrQnLTFGlyM_msaXcpG08x6yh-UzxQSln9dRzmcDqmW3Y1ukOi1X-XbFc_v61f8u1rs1k_bfNwpuccSfUSwKO3vkfD3eBKq2wpuUaNSAK47UdAdV5yDrwybvBGSu-VGq2RS3b35wYi2n_G8OHiz96WAoy28hclVEOs</recordid><startdate>202205</startdate><enddate>202205</enddate><creator>Fingler, Henrique</creator><creator>Zhu, Zhiting</creator><creator>Yoon, Esther</creator><creator>Jia, Zhipeng</creator><creator>Witchel, Emmett</creator><creator>Rossbach, Christopher J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>202205</creationdate><title>DGSF: Disaggregated GPUs for Serverless Functions</title><author>Fingler, Henrique ; Zhu, Zhiting ; Yoon, Esther ; Jia, Zhipeng ; Witchel, Emmett ; Rossbach, Christopher J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-7e4b300d7d9db761aca894983157577e2019bf074575eaa0d46acd633dd44f963</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2022</creationdate><topic>API remoting</topic><topic>Cloud computing</topic><topic>Distributed processing</topic><topic>FaaS</topic><topic>GPU</topic><topic>Graphics processing units</topic><topic>Machine learning</topic><topic>Production</topic><topic>Prototypes</topic><topic>Runtime</topic><topic>serverless</topic><toplevel>online_resources</toplevel><creatorcontrib>Fingler, Henrique</creatorcontrib><creatorcontrib>Zhu, Zhiting</creatorcontrib><creatorcontrib>Yoon, Esther</creatorcontrib><creatorcontrib>Jia, Zhipeng</creatorcontrib><creatorcontrib>Witchel, Emmett</creatorcontrib><creatorcontrib>Rossbach, Christopher J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library Online</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fingler, Henrique</au><au>Zhu, Zhiting</au><au>Yoon, Esther</au><au>Jia, Zhipeng</au><au>Witchel, Emmett</au><au>Rossbach, Christopher J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>DGSF: Disaggregated GPUs for Serverless Functions</atitle><btitle>2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)</btitle><stitle>IPDPS</stitle><date>2022-05</date><risdate>2022</risdate><spage>739</spage><epage>750</epage><pages>739-750</pages><eissn>1530-2075</eissn><eisbn>9781665481069</eisbn><eisbn>1665481064</eisbn><coden>IEEPAD</coden><abstract>Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF's API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.</abstract><pub>IEEE</pub><doi>10.1109/IPDPS53621.2022.00077</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 1530-2075
ispartof	2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022, p.739-750
issn	1530-2075
language	eng
recordid	cdi_ieee_primary_9820659
source	IEEE Xplore All Conference Series
subjects	API remoting Cloud computing Distributed processing FaaS GPU Graphics processing units Machine learning Production Prototypes Runtime serverless
title	DGSF: Disaggregated GPUs for Serverless Functions
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T07%3A26%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=DGSF:%20Disaggregated%20GPUs%20for%20Serverless%20Functions&rft.btitle=2022%20IEEE%20International%20Parallel%20and%20Distributed%20Processing%20Symposium%20(IPDPS)&rft.au=Fingler,%20Henrique&rft.date=2022-05&rft.spage=739&rft.epage=750&rft.pages=739-750&rft.eissn=1530-2075&rft.coden=IEEPAD&rft_id=info:doi/10.1109/IPDPS53621.2022.00077&rft.eisbn=9781665481069&rft.eisbn_list=1665481064&rft_dat=%3Cieee_CHZPO%3E9820659%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i203t-7e4b300d7d9db761aca894983157577e2019bf074575eaa0d46acd633dd44f963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9820659&rfr_iscdi=true