Loading…
SMaLL: A Software Framework for portable Machine Learning Libraries
Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference impl...
Saved in:
Published in: | arXiv.org 2023-03 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Sridhar, Upasana Tukanov, Nicholai Binder, Elliott Tze Meng Low McMillan, Scott Schatz, Martin D |
description | Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2785002448</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2785002448</sourcerecordid><originalsourceid>FETCH-proquest_journals_27850024483</originalsourceid><addsrcrecordid>eNqNjcEKgkAUAJcgSMp_eNBZsFVTuoUkHdaT3eUZz1qzXXur-Pt56AM6zWEGZiU8GUWHIIul3AjfuS4MQ3lMZZJEnsirEpU6wRkq244zMkHB-KbZ8gtayzBYHrHpCUq8P7UhUIRstHmA0g0ja3I7sW6xd-T_uBX74nLLr8HA9jORG-vOTmwWVcs0S5Z7HGfRf9UXcTE5Rw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2785002448</pqid></control><display><type>article</type><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><source>Publicly Available Content Database</source><creator>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</creator><creatorcontrib>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</creatorcontrib><description>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Computation ; Extensibility ; Hardware ; Inference ; Kernels ; Libraries ; Machine learning ; Mapping ; Performance evaluation ; Platforms ; Software</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2785002448?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Sridhar, Upasana</creatorcontrib><creatorcontrib>Tukanov, Nicholai</creatorcontrib><creatorcontrib>Binder, Elliott</creatorcontrib><creatorcontrib>Tze Meng Low</creatorcontrib><creatorcontrib>McMillan, Scott</creatorcontrib><creatorcontrib>Schatz, Martin D</creatorcontrib><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><title>arXiv.org</title><description>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</description><subject>Artificial neural networks</subject><subject>Computation</subject><subject>Extensibility</subject><subject>Hardware</subject><subject>Inference</subject><subject>Kernels</subject><subject>Libraries</subject><subject>Machine learning</subject><subject>Mapping</subject><subject>Performance evaluation</subject><subject>Platforms</subject><subject>Software</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjcEKgkAUAJcgSMp_eNBZsFVTuoUkHdaT3eUZz1qzXXur-Pt56AM6zWEGZiU8GUWHIIul3AjfuS4MQ3lMZZJEnsirEpU6wRkq244zMkHB-KbZ8gtayzBYHrHpCUq8P7UhUIRstHmA0g0ja3I7sW6xd-T_uBX74nLLr8HA9jORG-vOTmwWVcs0S5Z7HGfRf9UXcTE5Rw</recordid><startdate>20230308</startdate><enddate>20230308</enddate><creator>Sridhar, Upasana</creator><creator>Tukanov, Nicholai</creator><creator>Binder, Elliott</creator><creator>Tze Meng Low</creator><creator>McMillan, Scott</creator><creator>Schatz, Martin D</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230308</creationdate><title>SMaLL: A Software Framework for portable Machine Learning Libraries</title><author>Sridhar, Upasana ; Tukanov, Nicholai ; Binder, Elliott ; Tze Meng Low ; McMillan, Scott ; Schatz, Martin D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27850024483</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Computation</topic><topic>Extensibility</topic><topic>Hardware</topic><topic>Inference</topic><topic>Kernels</topic><topic>Libraries</topic><topic>Machine learning</topic><topic>Mapping</topic><topic>Performance evaluation</topic><topic>Platforms</topic><topic>Software</topic><toplevel>online_resources</toplevel><creatorcontrib>Sridhar, Upasana</creatorcontrib><creatorcontrib>Tukanov, Nicholai</creatorcontrib><creatorcontrib>Binder, Elliott</creatorcontrib><creatorcontrib>Tze Meng Low</creatorcontrib><creatorcontrib>McMillan, Scott</creatorcontrib><creatorcontrib>Schatz, Martin D</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sridhar, Upasana</au><au>Tukanov, Nicholai</au><au>Binder, Elliott</au><au>Tze Meng Low</au><au>McMillan, Scott</au><au>Schatz, Martin D</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>SMaLL: A Software Framework for portable Machine Learning Libraries</atitle><jtitle>arXiv.org</jtitle><date>2023-03-08</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures. In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries. We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-03 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2785002448 |
source | Publicly Available Content Database |
subjects | Artificial neural networks Computation Extensibility Hardware Inference Kernels Libraries Machine learning Mapping Performance evaluation Platforms Software |
title | SMaLL: A Software Framework for portable Machine Learning Libraries |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T20%3A11%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=SMaLL:%20A%20Software%20Framework%20for%20portable%20Machine%20Learning%20Libraries&rft.jtitle=arXiv.org&rft.au=Sridhar,%20Upasana&rft.date=2023-03-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2785002448%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27850024483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2785002448&rft_id=info:pmid/&rfr_iscdi=true |