Loading…
A 16 nJ/Classification FPGA-Based Wired-Logic DNN Accelerator Using Fixed-Weight Non-Linear Neural Net
A reconfigurable field-programmable gate array (FPGA)-based wired-logic deep neural network (DNN) accelerator is presented. High energy efficiency of 16 nJ/classification (Modified National Institute of Standards and Technology: MNIST) is achieved due to the wired-logic architecture. Each neuron in...
Saved in:
Published in: | IEEE journal on emerging and selected topics in circuits and systems 2021-12, Vol.11 (4), p.751-761 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A reconfigurable field-programmable gate array (FPGA)-based wired-logic deep neural network (DNN) accelerator is presented. High energy efficiency of 16 nJ/classification (Modified National Institute of Standards and Technology: MNIST) is achieved due to the wired-logic architecture. Each neuron in the neural network consists of combinational circuits only, and all the neurons are implemented on an FPGA. Intermediate data are never stored in memory or registers and are transmitted to the output stage by passing through only neuron cells. The latency and power required for memory access can be minimized, enabling low power and high throughput operation. A critical technical issue is reducing hardware resources because all neurons need to be implemented on an FPGA, where hardware resources are limited. Two core technologies have been developed to minimize the required hardware resources: (1) A neural network with a small number of neurons in which weight values of all synapses are fixed to a common value, and (2) a small neuron cell circuit consisting of an adder and look-up table containing an activation function. By fixing all weight values to a certain common value, processing can be simplified from multiply-accumulate operations to just additions, and the hardware resources required for each neuron are minimized. An experiment with the MNIST dataset using a 28-nm FPGA confirmed power consumption of 0.16 W and latency per inference of 100 ns (16nJ/classification). With the same recognition accuracy, the power efficiency is 45.6 times higher than that of the conventional state-of-the-art binarized DNN accelerator with a digital ASIC. |
---|---|
ISSN: | 2156-3357 2156-3365 |
DOI: | 10.1109/JETCAS.2021.3114179 |