Loading…

A tree-recursive partitioned multicast mechanism for NoC-based deep neural network accelerator

In chip multiprocessor systems (CMPs), Network on Chip (NoC) has been widely used due to its advantages of favorable reusability, high reliability, and low power consumption. Recently, using NoC platforms to accelerate deep neural networks (DNNs) has become a new trend. This design can enable the in...

Full description

Saved in:
Bibliographic Details
Published in:Microelectronics 2024-05, Vol.147, p.106161, Article 106161
Main Authors: Ouyang, Yiming, Zhang, Yihe, Liang, Huaguo, Li, Jianhua
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In chip multiprocessor systems (CMPs), Network on Chip (NoC) has been widely used due to its advantages of favorable reusability, high reliability, and low power consumption. Recently, using NoC platforms to accelerate deep neural networks (DNNs) has become a new trend. This design can enable the intermediate computation results of DNNs to be transmitted within the chip, reducing the number of accesses to off-chip memory. However, a large amount of one-to-many traffic in the DNN accelerator will occupy the system bandwidth, which will significantly reduce the performance of the NoC platform dominated by one-to-one traffic. To address this issue, we propose a tree-based recursive partitioning multicast scheme (TRPM), which increases the path diversity and improves the system bandwidth. We also design a single-cycle per-hop router architecture, which effectively enhances the transmission efficiency of multicast packets. Detailed simulation results show that compared with the latest tree-based multicast algorithm for DNN accelerators, our scheme reduces the number of routed packets by 35%, the classification latency by 13.5% and the average packet latency by 14.5% on average. •We design a multicast routing algorithm based on recursive partitioning.•We improve router architecture, provide single-cycle transmission for packets.•Our method reduces the number of packets, classification latency and packet latency.
ISSN:1879-2391
1879-2391
DOI:10.1016/j.mejo.2024.106161