Loading…

SMaLL: A Software Framework for portable Machine Learning Libraries

Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference impl...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-03
Main Authors:	Sridhar, Upasana, Tukanov, Nicholai, Binder, Elliott, Tze Meng Low, McMillan, Scott, Schatz, Martin D
Format:	Article
Language:	English
Subjects:	Artificial neural networks Computation Extensibility Hardware Inference Kernels Libraries Machine learning Mapping Performance evaluation Platforms Software
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Sridhar, Upasana Tukanov, Nicholai Binder, Elliott Tze Meng Low McMillan, Scott Schatz, Martin D
description	Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2785002448</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2785002448</sourcerecordid><originalsourceid>FETCH-proquest_journals_27850024483</originalsourceid><addsrcrecordid>eNqNjcEKgkAUAJcgSMp_eNBZsFVTuoUkHdaT3eUZz1qzXXur-Pt56AM6zWEGZiU8GUWHIIul3AjfuS4MQ3lMZZJEnsirEpU6wRkq244zMkHB-KbZ8gtayzBYHrHpCUq8P7UhUIRstHmA0g0ja3I7sW6xd-T_uBX74nLLr8HA9jORG-vOTmwWVcs0S5Z7HGfRf9UXcTE5Rw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2785002448</pqid></control><display><type>article</type><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><source>Publicly Available Content Database</source><creator>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</creator><creatorcontrib>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</creatorcontrib><description>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Computation ; Extensibility ; Hardware ; Inference ; Kernels ; Libraries ; Machine learning ; Mapping ; Performance evaluation ; Platforms ; Software</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2785002448?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Sridhar, Upasana</creatorcontrib><creatorcontrib>Tukanov, Nicholai</creatorcontrib><creatorcontrib>Binder, Elliott</creatorcontrib><creatorcontrib>Tze Meng Low</creatorcontrib><creatorcontrib>McMillan, Scott</creatorcontrib><creatorcontrib>Schatz, Martin D</creatorcontrib><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><title>arXiv.org</title><description>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</description><subject>Artificial neural networks</subject><subject>Computation</subject><subject>Extensibility</subject><subject>Hardware</subject><subject>Inference</subject><subject>Kernels</subject><subject>Libraries</subject><subject>Machine learning</subject><subject>Mapping</subject><subject>Performance evaluation</subject><subject>Platforms</subject><subject>Software</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjcEKgkAUAJcgSMp_eNBZsFVTuoUkHdaT3eUZz1qzXXur-Pt56AM6zWEGZiU8GUWHIIul3AjfuS4MQ3lMZZJEnsirEpU6wRkq244zMkHB-KbZ8gtayzBYHrHpCUq8P7UhUIRstHmA0g0ja3I7sW6xd-T_uBX74nLLr8HA9jORG-vOTmwWVcs0S5Z7HGfRf9UXcTE5Rw</recordid><startdate>20230308</startdate><enddate>20230308</enddate><creator>Sridhar, Upasana</creator><creator>Tukanov, Nicholai</creator><creator>Binder, Elliott</creator><creator>Tze Meng Low</creator><creator>McMillan, Scott</creator><creator>Schatz, Martin D</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230308</creationdate><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><author>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27850024483</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Computation</topic><topic>Extensibility</topic><topic>Hardware</topic><topic>Inference</topic><topic>Kernels</topic><topic>Libraries</topic><topic>Machine learning</topic><topic>Mapping</topic><topic>Performance evaluation</topic><topic>Platforms</topic><topic>Software</topic><toplevel>online_resources</toplevel><creatorcontrib>Sridhar, Upasana</creatorcontrib><creatorcontrib>Tukanov, Nicholai</creatorcontrib><creatorcontrib>Binder, Elliott</creatorcontrib><creatorcontrib>Tze Meng Low</creatorcontrib><creatorcontrib>McMillan, Scott</creatorcontrib><creatorcontrib>Schatz, Martin D</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sridhar, Upasana</au><au>Tukanov, Nicholai</au><au>Binder, Elliott</au><au>Tze Meng Low</au><au>McMillan, Scott</au><au>Schatz, Martin D</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>SMaLL: A Software Framework for portable Machine Learning Libraries</atitle><jtitle>arXiv.org</jtitle><date>2023-03-08</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2785002448
source	Publicly Available Content Database
subjects	Artificial neural networks Computation Extensibility Hardware Inference Kernels Libraries Machine learning Mapping Performance evaluation Platforms Software
title	SMaLL: A Software Framework for portable Machine Learning Libraries
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T20%3A11%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=SMaLL:%20A%20Software%20Framework%20for%20portable%20Machine%20Learning%20Libraries&rft.jtitle=arXiv.org&rft.au=Sridhar,%20Upasana&rft.date=2023-03-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2785002448%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27850024483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2785002448&rft_id=info:pmid/&rfr_iscdi=true