Loading…

MultiFuse: Efficient Cross Layer Fusion for DNN Accelerators with Multi-level Memory Hierarchy

In order to facilitate the deployment of diverse deep learning models while maintaining scalability, modern DNN accelerators frequently employ reconfigurable structures such as Network-on-Chip (NoC) and multi-level on-chip memory hierarchy. To achieve high energy efficiency, it is imperative to stor...

Full description

Saved in:
Bibliographic Details
Main Authors: Chang, Chia-Wei, Liou, Jing-Jia, Huang, Chih-Tsun, Hsu, Wei-Chung, Lu, Juin-Ming
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In order to facilitate the deployment of diverse deep learning models while maintaining scalability, modern DNN accelerators frequently employ reconfigurable structures such as Network-on-Chip (NoC) and multi-level on-chip memory hierarchy. To achieve high energy efficiency, it is imperative to store intermediate DNN-layer results within the on-chip memory hierarchy, thereby reducing the need for off-chip data transfers to/from the DRAM memory.Two well-established optimization techniques, node fusion and loop tiling, have proven effective in retaining temporary results within the on-chip buffers, commonly used to minimize off-chip DRAM accesses. In this paper, we introduce MultiFuse, an infrastructure designed to automatically explore multiple DNN layer node fusion techniques, enabling optimal utilization of the on-chip multi-level memory hierarchy.Experimental results demonstrate the effectiveness of our retargetable infrastructure, which outperforms Ansor's algorithm. Our exploration algorithm achieves a remarkable 70% reduction in Energy-Delay Product (EDP) while gaining a 67x speedup in search time when executing the data-intensive MobileNet model on a single DNN accelerator.
ISSN:2576-6996
DOI:10.1109/ICCD58817.2023.00097