Loading…
Analog Computing in Memory (CIM) Technique for General Matrix Multiplication (GEMM) to Support Deep Neural Network (DNN) and Cosine Similarity Search Computing using 3D AND-type NOR Flash Devices
The massively increasing data in computing world inspires the R&D of novel memory-centric computing architectures and devices. In this work, we propose a novel analog CIM technique for GEMM using 3D NOR Flash devices to support general-purpose matrix multiplication. Our analysis indicates that i...
Saved in:
Main Authors: | , , , , , , , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c258t-21e943f5c757cd87abf17bf0514305c145d48bf6e8d36375e9a9f93aec2bce483 |
---|---|
cites | |
container_end_page | 33.3.4 |
container_issue | |
container_start_page | 33.3.1 |
container_title | |
container_volume | |
creator | Wei, Ming-Liang Lue, Hang-Ting Ho, Shu-Yin Lin, Yen-Po Hsu, Tzu-Hsuan Hsieh, Chih-Chang Li, Yung-Chun Yeh, Teng-Hao Chen, Shih-Hung Jhu, Yi-Hao Li, Hsiang-Pang Hu, Han-Wen Hung, Chun-Hsiung Wang, Keh-Chung Lu, Chih-Yuan |
description | The massively increasing data in computing world inspires the R&D of novel memory-centric computing architectures and devices. In this work, we propose a novel analog CIM technique for GEMM using 3D NOR Flash devices to support general-purpose matrix multiplication. Our analysis indicates that it's very robust to use "billions" of memory cells with modest 4-level and large-spacing analog Icell to produce good accuracy and reliability, contrary to the past thinking to pursue many levels in each memory cell that inevitably suffers accuracy loss. We estimate that a 2.7Gb 3D NOR GEMM can provide a high-performance (frame/sec>300) image recognition inference of ResNet-50 on ImageNet dataset, using a simple flexible controller chip with 1MB SRAM without the need of massive ALU and external DRAM. The accuracy can maintain ~85% for Cifar-10 (by VGG7), and ~90% for ImageNet Top-5 (by ResNet-50) under good device control. This 3D NOR GEMM enjoys much lower system cost and flexibility than a complicated SOC. We also propose an operation and design method of "Cosine Similarity" computing using the 3D NOR. We can use a ternary similarity search algorithm with positive and negative inputs and weights to perform high-dimension feature vector (such as 512 for face recognition with FaceNet on VGGFace2 dataset) similarity computing in a high-parallelism CIM design (512 WL inputs, 1024 BL's, at Tread=100ns). High-accuracy search (~97.8%, almost identical to 98% of software computing) and high internal search bandwidth (~5Tb/s per chip) are achieved. This in-Flash search accelerator is potential to enable new hardware-aware search algorithms in big data retrieval applications. |
doi_str_mv | 10.1109/IEDM45625.2022.10019495 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10019495</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10019495</ieee_id><sourcerecordid>10019495</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-21e943f5c757cd87abf17bf0514305c145d48bf6e8d36375e9a9f93aec2bce483</originalsourceid><addsrcrecordid>eNpNkM9Og0AYxFcTE_--gYnfsT1QdxcW2GNTam0imFhNvJll-WhXKeCyqDyfL2aNmniZOc1vJkPIBaMTxqi8XM6TNBAhFxNOOZ8wSpkMpNgjxywMRRBLIcN9csSZCD3KosdDctx1z5TySEhxRD6ntaqaNcyabds7U6_B1JDitrEDjGbLdAz3qDe1ee0RysbCAmu0qoJUOWs-IO0rZ9rKaOVMU8NoMU93EdfAqm_bxjpIEFvIsP_OZOjeG_sCoyTLxqDqYtfamRphZbamUta4AVaorN78m9N33-onMM0Szw0tQnZ7B1eV6jY7-JvR2J2Sg1JVHZ79-gl5uJrfz669m9vFcja98TQXsfM4Qxn4pdCRiHQRRyovWZSXVLDAp0KzQBRBnJchxoUf-pFAqWQpfYWa5xqD2D8h5z9cg4hPrTVbZYenv8f9L-VTdw8</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Analog Computing in Memory (CIM) Technique for General Matrix Multiplication (GEMM) to Support Deep Neural Network (DNN) and Cosine Similarity Search Computing using 3D AND-type NOR Flash Devices</title><source>IEEE Xplore All Conference Series</source><creator>Wei, Ming-Liang ; Lue, Hang-Ting ; Ho, Shu-Yin ; Lin, Yen-Po ; Hsu, Tzu-Hsuan ; Hsieh, Chih-Chang ; Li, Yung-Chun ; Yeh, Teng-Hao ; Chen, Shih-Hung ; Jhu, Yi-Hao ; Li, Hsiang-Pang ; Hu, Han-Wen ; Hung, Chun-Hsiung ; Wang, Keh-Chung ; Lu, Chih-Yuan</creator><creatorcontrib>Wei, Ming-Liang ; Lue, Hang-Ting ; Ho, Shu-Yin ; Lin, Yen-Po ; Hsu, Tzu-Hsuan ; Hsieh, Chih-Chang ; Li, Yung-Chun ; Yeh, Teng-Hao ; Chen, Shih-Hung ; Jhu, Yi-Hao ; Li, Hsiang-Pang ; Hu, Han-Wen ; Hung, Chun-Hsiung ; Wang, Keh-Chung ; Lu, Chih-Yuan</creatorcontrib><description>The massively increasing data in computing world inspires the R&D of novel memory-centric computing architectures and devices. In this work, we propose a novel analog CIM technique for GEMM using 3D NOR Flash devices to support general-purpose matrix multiplication. Our analysis indicates that it's very robust to use "billions" of memory cells with modest 4-level and large-spacing analog Icell to produce good accuracy and reliability, contrary to the past thinking to pursue many levels in each memory cell that inevitably suffers accuracy loss. We estimate that a 2.7Gb 3D NOR GEMM can provide a high-performance (frame/sec>300) image recognition inference of ResNet-50 on ImageNet dataset, using a simple flexible controller chip with 1MB SRAM without the need of massive ALU and external DRAM. The accuracy can maintain ~85% for Cifar-10 (by VGG7), and ~90% for ImageNet Top-5 (by ResNet-50) under good device control. This 3D NOR GEMM enjoys much lower system cost and flexibility than a complicated SOC. We also propose an operation and design method of "Cosine Similarity" computing using the 3D NOR. We can use a ternary similarity search algorithm with positive and negative inputs and weights to perform high-dimension feature vector (such as 512 for face recognition with FaceNet on VGGFace2 dataset) similarity computing in a high-parallelism CIM design (512 WL inputs, 1024 BL's, at Tread=100ns). High-accuracy search (~97.8%, almost identical to 98% of software computing) and high internal search bandwidth (~5Tb/s per chip) are achieved. This in-Flash search accelerator is potential to enable new hardware-aware search algorithms in big data retrieval applications.</description><identifier>EISSN: 2156-017X</identifier><identifier>EISBN: 1665489596</identifier><identifier>EISBN: 9781665489591</identifier><identifier>DOI: 10.1109/IEDM45625.2022.10019495</identifier><language>eng</language><publisher>IEEE</publisher><subject>Common Information Model (computing) ; Memory architecture ; Neural networks ; Performance evaluation ; Random access memory ; Software algorithms ; Three-dimensional displays</subject><ispartof>2022 International Electron Devices Meeting (IEDM), 2022, p.33.3.1-33.3.4</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-21e943f5c757cd87abf17bf0514305c145d48bf6e8d36375e9a9f93aec2bce483</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10019495$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23928,23929,25138,27923,54553,54930</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10019495$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wei, Ming-Liang</creatorcontrib><creatorcontrib>Lue, Hang-Ting</creatorcontrib><creatorcontrib>Ho, Shu-Yin</creatorcontrib><creatorcontrib>Lin, Yen-Po</creatorcontrib><creatorcontrib>Hsu, Tzu-Hsuan</creatorcontrib><creatorcontrib>Hsieh, Chih-Chang</creatorcontrib><creatorcontrib>Li, Yung-Chun</creatorcontrib><creatorcontrib>Yeh, Teng-Hao</creatorcontrib><creatorcontrib>Chen, Shih-Hung</creatorcontrib><creatorcontrib>Jhu, Yi-Hao</creatorcontrib><creatorcontrib>Li, Hsiang-Pang</creatorcontrib><creatorcontrib>Hu, Han-Wen</creatorcontrib><creatorcontrib>Hung, Chun-Hsiung</creatorcontrib><creatorcontrib>Wang, Keh-Chung</creatorcontrib><creatorcontrib>Lu, Chih-Yuan</creatorcontrib><title>Analog Computing in Memory (CIM) Technique for General Matrix Multiplication (GEMM) to Support Deep Neural Network (DNN) and Cosine Similarity Search Computing using 3D AND-type NOR Flash Devices</title><title>2022 International Electron Devices Meeting (IEDM)</title><addtitle>IEDM</addtitle><description>The massively increasing data in computing world inspires the R&D of novel memory-centric computing architectures and devices. In this work, we propose a novel analog CIM technique for GEMM using 3D NOR Flash devices to support general-purpose matrix multiplication. Our analysis indicates that it's very robust to use "billions" of memory cells with modest 4-level and large-spacing analog Icell to produce good accuracy and reliability, contrary to the past thinking to pursue many levels in each memory cell that inevitably suffers accuracy loss. We estimate that a 2.7Gb 3D NOR GEMM can provide a high-performance (frame/sec>300) image recognition inference of ResNet-50 on ImageNet dataset, using a simple flexible controller chip with 1MB SRAM without the need of massive ALU and external DRAM. The accuracy can maintain ~85% for Cifar-10 (by VGG7), and ~90% for ImageNet Top-5 (by ResNet-50) under good device control. This 3D NOR GEMM enjoys much lower system cost and flexibility than a complicated SOC. We also propose an operation and design method of "Cosine Similarity" computing using the 3D NOR. We can use a ternary similarity search algorithm with positive and negative inputs and weights to perform high-dimension feature vector (such as 512 for face recognition with FaceNet on VGGFace2 dataset) similarity computing in a high-parallelism CIM design (512 WL inputs, 1024 BL's, at Tread=100ns). High-accuracy search (~97.8%, almost identical to 98% of software computing) and high internal search bandwidth (~5Tb/s per chip) are achieved. This in-Flash search accelerator is potential to enable new hardware-aware search algorithms in big data retrieval applications.</description><subject>Common Information Model (computing)</subject><subject>Memory architecture</subject><subject>Neural networks</subject><subject>Performance evaluation</subject><subject>Random access memory</subject><subject>Software algorithms</subject><subject>Three-dimensional displays</subject><issn>2156-017X</issn><isbn>1665489596</isbn><isbn>9781665489591</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2022</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpNkM9Og0AYxFcTE_--gYnfsT1QdxcW2GNTam0imFhNvJll-WhXKeCyqDyfL2aNmniZOc1vJkPIBaMTxqi8XM6TNBAhFxNOOZ8wSpkMpNgjxywMRRBLIcN9csSZCD3KosdDctx1z5TySEhxRD6ntaqaNcyabds7U6_B1JDitrEDjGbLdAz3qDe1ee0RysbCAmu0qoJUOWs-IO0rZ9rKaOVMU8NoMU93EdfAqm_bxjpIEFvIsP_OZOjeG_sCoyTLxqDqYtfamRphZbamUta4AVaorN78m9N33-onMM0Szw0tQnZ7B1eV6jY7-JvR2J2Sg1JVHZ79-gl5uJrfz669m9vFcja98TQXsfM4Qxn4pdCRiHQRRyovWZSXVLDAp0KzQBRBnJchxoUf-pFAqWQpfYWa5xqD2D8h5z9cg4hPrTVbZYenv8f9L-VTdw8</recordid><startdate>20221203</startdate><enddate>20221203</enddate><creator>Wei, Ming-Liang</creator><creator>Lue, Hang-Ting</creator><creator>Ho, Shu-Yin</creator><creator>Lin, Yen-Po</creator><creator>Hsu, Tzu-Hsuan</creator><creator>Hsieh, Chih-Chang</creator><creator>Li, Yung-Chun</creator><creator>Yeh, Teng-Hao</creator><creator>Chen, Shih-Hung</creator><creator>Jhu, Yi-Hao</creator><creator>Li, Hsiang-Pang</creator><creator>Hu, Han-Wen</creator><creator>Hung, Chun-Hsiung</creator><creator>Wang, Keh-Chung</creator><creator>Lu, Chih-Yuan</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20221203</creationdate><title>Analog Computing in Memory (CIM) Technique for General Matrix Multiplication (GEMM) to Support Deep Neural Network (DNN) and Cosine Similarity Search Computing using 3D AND-type NOR Flash Devices</title><author>Wei, Ming-Liang ; Lue, Hang-Ting ; Ho, Shu-Yin ; Lin, Yen-Po ; Hsu, Tzu-Hsuan ; Hsieh, Chih-Chang ; Li, Yung-Chun ; Yeh, Teng-Hao ; Chen, Shih-Hung ; Jhu, Yi-Hao ; Li, Hsiang-Pang ; Hu, Han-Wen ; Hung, Chun-Hsiung ; Wang, Keh-Chung ; Lu, Chih-Yuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-21e943f5c757cd87abf17bf0514305c145d48bf6e8d36375e9a9f93aec2bce483</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Common Information Model (computing)</topic><topic>Memory architecture</topic><topic>Neural networks</topic><topic>Performance evaluation</topic><topic>Random access memory</topic><topic>Software algorithms</topic><topic>Three-dimensional displays</topic><toplevel>online_resources</toplevel><creatorcontrib>Wei, Ming-Liang</creatorcontrib><creatorcontrib>Lue, Hang-Ting</creatorcontrib><creatorcontrib>Ho, Shu-Yin</creatorcontrib><creatorcontrib>Lin, Yen-Po</creatorcontrib><creatorcontrib>Hsu, Tzu-Hsuan</creatorcontrib><creatorcontrib>Hsieh, Chih-Chang</creatorcontrib><creatorcontrib>Li, Yung-Chun</creatorcontrib><creatorcontrib>Yeh, Teng-Hao</creatorcontrib><creatorcontrib>Chen, Shih-Hung</creatorcontrib><creatorcontrib>Jhu, Yi-Hao</creatorcontrib><creatorcontrib>Li, Hsiang-Pang</creatorcontrib><creatorcontrib>Hu, Han-Wen</creatorcontrib><creatorcontrib>Hung, Chun-Hsiung</creatorcontrib><creatorcontrib>Wang, Keh-Chung</creatorcontrib><creatorcontrib>Lu, Chih-Yuan</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library Online</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Ming-Liang</au><au>Lue, Hang-Ting</au><au>Ho, Shu-Yin</au><au>Lin, Yen-Po</au><au>Hsu, Tzu-Hsuan</au><au>Hsieh, Chih-Chang</au><au>Li, Yung-Chun</au><au>Yeh, Teng-Hao</au><au>Chen, Shih-Hung</au><au>Jhu, Yi-Hao</au><au>Li, Hsiang-Pang</au><au>Hu, Han-Wen</au><au>Hung, Chun-Hsiung</au><au>Wang, Keh-Chung</au><au>Lu, Chih-Yuan</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Analog Computing in Memory (CIM) Technique for General Matrix Multiplication (GEMM) to Support Deep Neural Network (DNN) and Cosine Similarity Search Computing using 3D AND-type NOR Flash Devices</atitle><btitle>2022 International Electron Devices Meeting (IEDM)</btitle><stitle>IEDM</stitle><date>2022-12-03</date><risdate>2022</risdate><spage>33.3.1</spage><epage>33.3.4</epage><pages>33.3.1-33.3.4</pages><eissn>2156-017X</eissn><eisbn>1665489596</eisbn><eisbn>9781665489591</eisbn><abstract>The massively increasing data in computing world inspires the R&D of novel memory-centric computing architectures and devices. In this work, we propose a novel analog CIM technique for GEMM using 3D NOR Flash devices to support general-purpose matrix multiplication. Our analysis indicates that it's very robust to use "billions" of memory cells with modest 4-level and large-spacing analog Icell to produce good accuracy and reliability, contrary to the past thinking to pursue many levels in each memory cell that inevitably suffers accuracy loss. We estimate that a 2.7Gb 3D NOR GEMM can provide a high-performance (frame/sec>300) image recognition inference of ResNet-50 on ImageNet dataset, using a simple flexible controller chip with 1MB SRAM without the need of massive ALU and external DRAM. The accuracy can maintain ~85% for Cifar-10 (by VGG7), and ~90% for ImageNet Top-5 (by ResNet-50) under good device control. This 3D NOR GEMM enjoys much lower system cost and flexibility than a complicated SOC. We also propose an operation and design method of "Cosine Similarity" computing using the 3D NOR. We can use a ternary similarity search algorithm with positive and negative inputs and weights to perform high-dimension feature vector (such as 512 for face recognition with FaceNet on VGGFace2 dataset) similarity computing in a high-parallelism CIM design (512 WL inputs, 1024 BL's, at Tread=100ns). High-accuracy search (~97.8%, almost identical to 98% of software computing) and high internal search bandwidth (~5Tb/s per chip) are achieved. This in-Flash search accelerator is potential to enable new hardware-aware search algorithms in big data retrieval applications.</abstract><pub>IEEE</pub><doi>10.1109/IEDM45625.2022.10019495</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2156-017X |
ispartof | 2022 International Electron Devices Meeting (IEDM), 2022, p.33.3.1-33.3.4 |
issn | 2156-017X |
language | eng |
recordid | cdi_ieee_primary_10019495 |
source | IEEE Xplore All Conference Series |
subjects | Common Information Model (computing) Memory architecture Neural networks Performance evaluation Random access memory Software algorithms Three-dimensional displays |
title | Analog Computing in Memory (CIM) Technique for General Matrix Multiplication (GEMM) to Support Deep Neural Network (DNN) and Cosine Similarity Search Computing using 3D AND-type NOR Flash Devices |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T00%3A07%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Analog%20Computing%20in%20Memory%20(CIM)%20Technique%20for%20General%20Matrix%20Multiplication%20(GEMM)%20to%20Support%20Deep%20Neural%20Network%20(DNN)%20and%20Cosine%20Similarity%20Search%20Computing%20using%203D%20AND-type%20NOR%20Flash%20Devices&rft.btitle=2022%20International%20Electron%20Devices%20Meeting%20(IEDM)&rft.au=Wei,%20Ming-Liang&rft.date=2022-12-03&rft.spage=33.3.1&rft.epage=33.3.4&rft.pages=33.3.1-33.3.4&rft.eissn=2156-017X&rft_id=info:doi/10.1109/IEDM45625.2022.10019495&rft.eisbn=1665489596&rft.eisbn_list=9781665489591&rft_dat=%3Cieee_CHZPO%3E10019495%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c258t-21e943f5c757cd87abf17bf0514305c145d48bf6e8d36375e9a9f93aec2bce483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10019495&rfr_iscdi=true |