Loading…

SMaLL: A Software Framework for portable Machine Learning Libraries

Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference impl...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-03
Main Authors: Sridhar, Upasana, Tukanov, Nicholai, Binder, Elliott, Tze Meng Low, McMillan, Scott, Schatz, Martin D
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Sridhar, Upasana
Tukanov, Nicholai
Binder, Elliott
Tze Meng Low
McMillan, Scott
Schatz, Martin D
description Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2785002448</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2785002448</sourcerecordid><originalsourceid>FETCH-proquest_journals_27850024483</originalsourceid><addsrcrecordid>eNqNjcEKgkAUAJcgSMp_eNBZsFVTuoUkHdaT3eUZz1qzXXur-Pt56AM6zWEGZiU8GUWHIIul3AjfuS4MQ3lMZZJEnsirEpU6wRkq244zMkHB-KbZ8gtayzBYHrHpCUq8P7UhUIRstHmA0g0ja3I7sW6xd-T_uBX74nLLr8HA9jORG-vOTmwWVcs0S5Z7HGfRf9UXcTE5Rw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2785002448</pqid></control><display><type>article</type><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><source>Publicly Available Content Database</source><creator>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</creator><creatorcontrib>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</creatorcontrib><description>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Computation ; Extensibility ; Hardware ; Inference ; Kernels ; Libraries ; Machine learning ; Mapping ; Performance evaluation ; Platforms ; Software</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2785002448?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Sridhar, Upasana</creatorcontrib><creatorcontrib>Tukanov, Nicholai</creatorcontrib><creatorcontrib>Binder, Elliott</creatorcontrib><creatorcontrib>Tze Meng Low</creatorcontrib><creatorcontrib>McMillan, Scott</creatorcontrib><creatorcontrib>Schatz, Martin D</creatorcontrib><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><title>arXiv.org</title><description>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</description><subject>Artificial neural networks</subject><subject>Computation</subject><subject>Extensibility</subject><subject>Hardware</subject><subject>Inference</subject><subject>Kernels</subject><subject>Libraries</subject><subject>Machine learning</subject><subject>Mapping</subject><subject>Performance evaluation</subject><subject>Platforms</subject><subject>Software</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjcEKgkAUAJcgSMp_eNBZsFVTuoUkHdaT3eUZz1qzXXur-Pt56AM6zWEGZiU8GUWHIIul3AjfuS4MQ3lMZZJEnsirEpU6wRkq244zMkHB-KbZ8gtayzBYHrHpCUq8P7UhUIRstHmA0g0ja3I7sW6xd-T_uBX74nLLr8HA9jORG-vOTmwWVcs0S5Z7HGfRf9UXcTE5Rw</recordid><startdate>20230308</startdate><enddate>20230308</enddate><creator>Sridhar, Upasana</creator><creator>Tukanov, Nicholai</creator><creator>Binder, Elliott</creator><creator>Tze Meng Low</creator><creator>McMillan, Scott</creator><creator>Schatz, Martin D</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230308</creationdate><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><author>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27850024483</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Computation</topic><topic>Extensibility</topic><topic>Hardware</topic><topic>Inference</topic><topic>Kernels</topic><topic>Libraries</topic><topic>Machine learning</topic><topic>Mapping</topic><topic>Performance evaluation</topic><topic>Platforms</topic><topic>Software</topic><toplevel>online_resources</toplevel><creatorcontrib>Sridhar, Upasana</creatorcontrib><creatorcontrib>Tukanov, Nicholai</creatorcontrib><creatorcontrib>Binder, Elliott</creatorcontrib><creatorcontrib>Tze Meng Low</creatorcontrib><creatorcontrib>McMillan, Scott</creatorcontrib><creatorcontrib>Schatz, Martin D</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sridhar, Upasana</au><au>Tukanov, Nicholai</au><au>Binder, Elliott</au><au>Tze Meng Low</au><au>McMillan, Scott</au><au>Schatz, Martin D</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>SMaLL: A Software Framework for portable Machine Learning Libraries</atitle><jtitle>arXiv.org</jtitle><date>2023-03-08</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2785002448
source Publicly Available Content Database
subjects Artificial neural networks
Computation
Extensibility
Hardware
Inference
Kernels
Libraries
Machine learning
Mapping
Performance evaluation
Platforms
Software
title SMaLL: A Software Framework for portable Machine Learning Libraries
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T20%3A11%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=SMaLL:%20A%20Software%20Framework%20for%20portable%20Machine%20Learning%20Libraries&rft.jtitle=arXiv.org&rft.au=Sridhar,%20Upasana&rft.date=2023-03-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2785002448%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27850024483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2785002448&rft_id=info:pmid/&rfr_iscdi=true