Loading…

Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories

Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data...

Full description

Saved in:
Bibliographic Details
Published in:IEEE computer architecture letters 2024-01, Vol.23 (1), p.137-141
Main Authors: Rezaei, Seyyed Hossein SeyyedAghaei, Moghaddam, Parham Zilouchian, Modarressi, Mehdi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c245t-efb745a301f56e775443759540352aa7fe853bbb98a51db48360906def7e39163
container_end_page 141
container_issue 1
container_start_page 137
container_title IEEE computer architecture letters
container_volume 23
creator Rezaei, Seyyed Hossein SeyyedAghaei
Moghaddam, Parham Zilouchian
Modarressi, Mehdi
description Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present Smart Memory , a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in Smart Memory the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.
doi_str_mv 10.1109/LCA.2023.3287976
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_3068175111</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10198388</ieee_id><sourcerecordid>3068175111</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-efb745a301f56e775443759540352aa7fe853bbb98a51db48360906def7e39163</originalsourceid><addsrcrecordid>eNpNkLtOw0AQRVcIJEKgp6CwRO2w49knDYoSXpIRRaBerZ0xckjssOsU-XscOUJUM8U9dzSHsWvgEwBu7_LZdJLxDCeYGW21OmEjkFKliitx-rdLdc4uYlxxLhQaMWIPi40PXfJGmzbs75M50TbJyYembr6SaVnSmoLv6rZJ6ibBebrofPlNywGoKV6ys8qvI10d55h9Pj1-zF7S_P35dTbN0zITskupKrSQHjlUUpHWUgjU0krBUWbe64qMxKIorPESloUwqLjlakmVJrSgcMxuh95taH92FDu3aneh6U865MqAlgDQp_iQKkMbY6DKbUPdP7h3wN1Bk-s1uYMmd9TUIzcDUhPRvzhYg8bgLxM9YOY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3068175111</pqid></control><display><type>article</type><title>Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Rezaei, Seyyed Hossein SeyyedAghaei ; Moghaddam, Parham Zilouchian ; Modarressi, Mehdi</creator><creatorcontrib>Rezaei, Seyyed Hossein SeyyedAghaei ; Moghaddam, Parham Zilouchian ; Modarressi, Mehdi</creatorcontrib><description>Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present Smart Memory , a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in Smart Memory the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.</description><identifier>ISSN: 1556-6056</identifier><identifier>EISSN: 1556-6064</identifier><identifier>DOI: 10.1109/LCA.2023.3287976</identifier><identifier>CODEN: ICALC3</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>3D-stacked memory ; Accelerators ; Artificial neural networks ; Bandwidth ; Bandwidths ; Bottlenecks ; Computer architecture ; Data communication ; Data transmission ; deep learning accelerator ; Distributed memory ; Machine learning ; Memory devices ; Memory management ; Network-on-memory ; Neural networks ; processing-in-memory ; Random access memory ; Switches ; System on chip ; Three-dimensional displays</subject><ispartof>IEEE computer architecture letters, 2024-01, Vol.23 (1), p.137-141</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-efb745a301f56e775443759540352aa7fe853bbb98a51db48360906def7e39163</cites><orcidid>0009-0009-3144-3078 ; 0000-0002-6310-8954 ; 0000-0002-4117-7609</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10198388$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Rezaei, Seyyed Hossein SeyyedAghaei</creatorcontrib><creatorcontrib>Moghaddam, Parham Zilouchian</creatorcontrib><creatorcontrib>Modarressi, Mehdi</creatorcontrib><title>Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories</title><title>IEEE computer architecture letters</title><addtitle>LCA</addtitle><description>Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present Smart Memory , a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in Smart Memory the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.</description><subject>3D-stacked memory</subject><subject>Accelerators</subject><subject>Artificial neural networks</subject><subject>Bandwidth</subject><subject>Bandwidths</subject><subject>Bottlenecks</subject><subject>Computer architecture</subject><subject>Data communication</subject><subject>Data transmission</subject><subject>deep learning accelerator</subject><subject>Distributed memory</subject><subject>Machine learning</subject><subject>Memory devices</subject><subject>Memory management</subject><subject>Network-on-memory</subject><subject>Neural networks</subject><subject>processing-in-memory</subject><subject>Random access memory</subject><subject>Switches</subject><subject>System on chip</subject><subject>Three-dimensional displays</subject><issn>1556-6056</issn><issn>1556-6064</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkLtOw0AQRVcIJEKgp6CwRO2w49knDYoSXpIRRaBerZ0xckjssOsU-XscOUJUM8U9dzSHsWvgEwBu7_LZdJLxDCeYGW21OmEjkFKliitx-rdLdc4uYlxxLhQaMWIPi40PXfJGmzbs75M50TbJyYembr6SaVnSmoLv6rZJ6ibBebrofPlNywGoKV6ys8qvI10d55h9Pj1-zF7S_P35dTbN0zITskupKrSQHjlUUpHWUgjU0krBUWbe64qMxKIorPESloUwqLjlakmVJrSgcMxuh95taH92FDu3aneh6U865MqAlgDQp_iQKkMbY6DKbUPdP7h3wN1Bk-s1uYMmd9TUIzcDUhPRvzhYg8bgLxM9YOY</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Rezaei, Seyyed Hossein SeyyedAghaei</creator><creator>Moghaddam, Parham Zilouchian</creator><creator>Modarressi, Mehdi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0009-3144-3078</orcidid><orcidid>https://orcid.org/0000-0002-6310-8954</orcidid><orcidid>https://orcid.org/0000-0002-4117-7609</orcidid></search><sort><creationdate>20240101</creationdate><title>Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories</title><author>Rezaei, Seyyed Hossein SeyyedAghaei ; Moghaddam, Parham Zilouchian ; Modarressi, Mehdi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-efb745a301f56e775443759540352aa7fe853bbb98a51db48360906def7e39163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D-stacked memory</topic><topic>Accelerators</topic><topic>Artificial neural networks</topic><topic>Bandwidth</topic><topic>Bandwidths</topic><topic>Bottlenecks</topic><topic>Computer architecture</topic><topic>Data communication</topic><topic>Data transmission</topic><topic>deep learning accelerator</topic><topic>Distributed memory</topic><topic>Machine learning</topic><topic>Memory devices</topic><topic>Memory management</topic><topic>Network-on-memory</topic><topic>Neural networks</topic><topic>processing-in-memory</topic><topic>Random access memory</topic><topic>Switches</topic><topic>System on chip</topic><topic>Three-dimensional displays</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rezaei, Seyyed Hossein SeyyedAghaei</creatorcontrib><creatorcontrib>Moghaddam, Parham Zilouchian</creatorcontrib><creatorcontrib>Modarressi, Mehdi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE computer architecture letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rezaei, Seyyed Hossein SeyyedAghaei</au><au>Moghaddam, Parham Zilouchian</au><au>Modarressi, Mehdi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories</atitle><jtitle>IEEE computer architecture letters</jtitle><stitle>LCA</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>23</volume><issue>1</issue><spage>137</spage><epage>141</epage><pages>137-141</pages><issn>1556-6056</issn><eissn>1556-6064</eissn><coden>ICALC3</coden><abstract>Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present Smart Memory , a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in Smart Memory the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LCA.2023.3287976</doi><tpages>5</tpages><orcidid>https://orcid.org/0009-0009-3144-3078</orcidid><orcidid>https://orcid.org/0000-0002-6310-8954</orcidid><orcidid>https://orcid.org/0000-0002-4117-7609</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1556-6056
ispartof IEEE computer architecture letters, 2024-01, Vol.23 (1), p.137-141
issn 1556-6056
1556-6064
language eng
recordid cdi_proquest_journals_3068175111
source IEEE Electronic Library (IEL) Journals
subjects 3D-stacked memory
Accelerators
Artificial neural networks
Bandwidth
Bandwidths
Bottlenecks
Computer architecture
Data communication
Data transmission
deep learning accelerator
Distributed memory
Machine learning
Memory devices
Memory management
Network-on-memory
Neural networks
processing-in-memory
Random access memory
Switches
System on chip
Three-dimensional displays
title Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T12%3A42%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Smart%20Memory:%20Deep%20Learning%20Acceleration%20in%203D-Stacked%20Memories&rft.jtitle=IEEE%20computer%20architecture%20letters&rft.au=Rezaei,%20Seyyed%20Hossein%20SeyyedAghaei&rft.date=2024-01-01&rft.volume=23&rft.issue=1&rft.spage=137&rft.epage=141&rft.pages=137-141&rft.issn=1556-6056&rft.eissn=1556-6064&rft.coden=ICALC3&rft_id=info:doi/10.1109/LCA.2023.3287976&rft_dat=%3Cproquest_ieee_%3E3068175111%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c245t-efb745a301f56e775443759540352aa7fe853bbb98a51db48360906def7e39163%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3068175111&rft_id=info:pmid/&rft_ieee_id=10198388&rfr_iscdi=true