Loading…

AERO: Design Space Exploration Framework for resource-constrained CNN mapping on Tile-based Accelerators

Analog In-Memory Compute (AIMC) arrays can store weights and perform matrix-vector multiplication operations for Deep Convolutional Neural Networks (CNNs). A number of recent efforts have integrated AIMC arrays into hybrid digital-analog accelerators in a multi-layer parallel manner to achieve energ...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal on emerging and selected topics in circuits and systems 2022-06, Vol.12 (2), p.1-1
Main Authors: Yang, Simei, Bhattacharjee, Debjyoti, Kumar, Vinay B. Y., Chatterjee, Saikat, De, Sayandip, Debacker, Peter, Verkest, Diederik, Mallik, Arindam, Catthoor, Francky
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Analog In-Memory Compute (AIMC) arrays can store weights and perform matrix-vector multiplication operations for Deep Convolutional Neural Networks (CNNs). A number of recent efforts have integrated AIMC arrays into hybrid digital-analog accelerators in a multi-layer parallel manner to achieve energy efficiency and high throughput. Multi-layer parallelism on large-scale tile-based architectures need efficient mapping support at the processing element (PE)-level (e.g., digital or analog processing elements) and tile-level. To find the most efficient architectures, fast and accurate design space exploration (DSE) support is required. In this paper, a novel DSE framework, AERO, is presented to characterize a CNN inference workload executing on hybrid tile-based architectures that supports multi-layer parallelism. Three characteristics can be seen in our DSE framework: (1) It presents a hierarchical Tile/PE-level mapping exploration strategy including inter-layer interaction, and allowing layer fusion/splitting configurations for PE-level mapping optimization. (2) It unlocks different Performance, Power and Area (PPA) exploration points under both sufficient and limited resource constraints, while limited resource case is not considered in prior works of multi-layer parallel architectures. The impact of weight loading and weight stationary mapping are analyzed for better insights into hybrid tile-based architectures. (3) It incorporates a detailed PPA model that supports a broad range of hybrid digital and analog units in a tile. Experimental case-studies are performed for realistic and relevant benchmarks such as MLP, CNNs (Lenet-5, Resnet-18,-34,-50 and -101).
ISSN:2156-3357
2156-3365
DOI:10.1109/JETCAS.2022.3171826