Loading…

A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine

The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption is...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yuan, Geng, Ma, Xiaolong, Lin, Sheng, Li, Zhengang, Deng, Jieren, Ding, Caiwen
Format:	Conference Proceeding
Language:	English
Subjects:	Embedded systems Power demand Quantization (signal) System performance Torque Training Writing
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	42
container_issue
container_start_page	37
container_title
container_volume
creator	Yuan, Geng Ma, Xiaolong Lin, Sheng Li, Zhengang Deng, Jieren Ding, Caiwen
description	The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption is required. To address these challenges, spin-orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, non-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy-efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra-energy-efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system performance rate (frame per second) in the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficient and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area & power) and performance rate of SOT-MRAM PIM engine is significantly improved.
doi_str_mv	10.1109/SOCC49529.2020.9524757
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9524757</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9524757</ieee_id><sourcerecordid>9524757</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-9dcdb3a0e487b531ec8dff0ae1d1188287fed60b7747dd51c870cc79bfc50d963</originalsourceid><addsrcrecordid>eNotj9FKwzAYRqMguE2fQJC8QOqfNGmSy1I3nayruAnejTb5O6q2HYkge3sVd_Wdi8OBj5BbDgnnYO82VVFIq4RNBAhIfklqpc_IlGthuNEyezsnE8EzybiG7JJMY3wHkAqsmJCnnN6v17QY-0PAGLtxoItQ9_g9hg_ajoFuqi0rX_KSNXVET5_D6P68Yc-WAyuxH8ORzod9N-AVuWjrz4jXp52R18V8WzyyVfWwLPIV6wSkX8x655u0BpRGNyrl6IxvW6iRe86NEUa36DNotJbae8Wd0eCctk3rFHibpTNy89_tEHF3CF1fh-Pu9Dv9AfjdTN8</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine</title><source>IEEE Xplore All Conference Series</source><creator>Yuan, Geng ; Ma, Xiaolong ; Lin, Sheng ; Li, Zhengang ; Deng, Jieren ; Ding, Caiwen</creator><creatorcontrib>Yuan, Geng ; Ma, Xiaolong ; Lin, Sheng ; Li, Zhengang ; Deng, Jieren ; Ding, Caiwen</creatorcontrib><description>The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption is required. To address these challenges, spin-orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, non-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy-efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra-energy-efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system performance rate (frame per second) in the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficient and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area & power) and performance rate of SOT-MRAM PIM engine is significantly improved.</description><identifier>EISSN: 2164-1706</identifier><identifier>EISBN: 172818746X</identifier><identifier>EISBN: 9781728187464</identifier><identifier>DOI: 10.1109/SOCC49529.2020.9524757</identifier><language>eng</language><publisher>IEEE</publisher><subject>Embedded systems ; Power demand ; Quantization (signal) ; System performance ; Torque ; Training ; Writing</subject><ispartof>2020 IEEE 33rd International System-on-Chip Conference (SOCC), 2020, p.37-42</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9524757$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9524757$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yuan, Geng</creatorcontrib><creatorcontrib>Ma, Xiaolong</creatorcontrib><creatorcontrib>Lin, Sheng</creatorcontrib><creatorcontrib>Li, Zhengang</creatorcontrib><creatorcontrib>Deng, Jieren</creatorcontrib><creatorcontrib>Ding, Caiwen</creatorcontrib><title>A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine</title><title>2020 IEEE 33rd International System-on-Chip Conference (SOCC)</title><addtitle>SOCC</addtitle><description>The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption is required. To address these challenges, spin-orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, non-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy-efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra-energy-efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system performance rate (frame per second) in the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficient and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area & power) and performance rate of SOT-MRAM PIM engine is significantly improved.</description><subject>Embedded systems</subject><subject>Power demand</subject><subject>Quantization (signal)</subject><subject>System performance</subject><subject>Torque</subject><subject>Training</subject><subject>Writing</subject><issn>2164-1706</issn><isbn>172818746X</isbn><isbn>9781728187464</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj9FKwzAYRqMguE2fQJC8QOqfNGmSy1I3nayruAnejTb5O6q2HYkge3sVd_Wdi8OBj5BbDgnnYO82VVFIq4RNBAhIfklqpc_IlGthuNEyezsnE8EzybiG7JJMY3wHkAqsmJCnnN6v17QY-0PAGLtxoItQ9_g9hg_ajoFuqi0rX_KSNXVET5_D6P68Yc-WAyuxH8ORzod9N-AVuWjrz4jXp52R18V8WzyyVfWwLPIV6wSkX8x655u0BpRGNyrl6IxvW6iRe86NEUa36DNotJbae8Wd0eCctk3rFHibpTNy89_tEHF3CF1fh-Pu9Dv9AfjdTN8</recordid><startdate>20200908</startdate><enddate>20200908</enddate><creator>Yuan, Geng</creator><creator>Ma, Xiaolong</creator><creator>Lin, Sheng</creator><creator>Li, Zhengang</creator><creator>Deng, Jieren</creator><creator>Ding, Caiwen</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20200908</creationdate><title>A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine</title><author>Yuan, Geng ; Ma, Xiaolong ; Lin, Sheng ; Li, Zhengang ; Deng, Jieren ; Ding, Caiwen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-9dcdb3a0e487b531ec8dff0ae1d1188287fed60b7747dd51c870cc79bfc50d963</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Embedded systems</topic><topic>Power demand</topic><topic>Quantization (signal)</topic><topic>System performance</topic><topic>Torque</topic><topic>Training</topic><topic>Writing</topic><toplevel>online_resources</toplevel><creatorcontrib>Yuan, Geng</creatorcontrib><creatorcontrib>Ma, Xiaolong</creatorcontrib><creatorcontrib>Lin, Sheng</creatorcontrib><creatorcontrib>Li, Zhengang</creatorcontrib><creatorcontrib>Deng, Jieren</creatorcontrib><creatorcontrib>Ding, Caiwen</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yuan, Geng</au><au>Ma, Xiaolong</au><au>Lin, Sheng</au><au>Li, Zhengang</au><au>Deng, Jieren</au><au>Ding, Caiwen</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine</atitle><btitle>2020 IEEE 33rd International System-on-Chip Conference (SOCC)</btitle><stitle>SOCC</stitle><date>2020-09-08</date><risdate>2020</risdate><spage>37</spage><epage>42</epage><pages>37-42</pages><eissn>2164-1706</eissn><eisbn>172818746X</eisbn><eisbn>9781728187464</eisbn><abstract>The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption is required. To address these challenges, spin-orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, non-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy-efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra-energy-efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system performance rate (frame per second) in the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficient and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area & power) and performance rate of SOT-MRAM PIM engine is significantly improved.</abstract><pub>IEEE</pub><doi>10.1109/SOCC49529.2020.9524757</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2164-1706
ispartof	2020 IEEE 33rd International System-on-Chip Conference (SOCC), 2020, p.37-42
issn	2164-1706
language	eng
recordid	cdi_ieee_primary_9524757
source	IEEE Xplore All Conference Series
subjects	Embedded systems Power demand Quantization (signal) System performance Torque Training Writing
title	A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T23%3A16%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20DNN%20Compression%20Framework%20for%20SOT-MRAM-based%20Processing-In-Memory%20Engine&rft.btitle=2020%20IEEE%2033rd%20International%20System-on-Chip%20Conference%20(SOCC)&rft.au=Yuan,%20Geng&rft.date=2020-09-08&rft.spage=37&rft.epage=42&rft.pages=37-42&rft.eissn=2164-1706&rft_id=info:doi/10.1109/SOCC49529.2020.9524757&rft.eisbn=172818746X&rft.eisbn_list=9781728187464&rft_dat=%3Cieee_CHZPO%3E9524757%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i203t-9dcdb3a0e487b531ec8dff0ae1d1188287fed60b7747dd51c870cc79bfc50d963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9524757&rfr_iscdi=true