Loading…

DiBA: n-Dimensional Bitslice Architecture for LSTM Implementation

A hardware architecture for the implementation of LSTM neural networks that can be sized to the specific size of the problem is proposed here. Implementation of an LSTM application requires iteration of multiplications, additions, and the activation functions that operate on the stream of data input...

Full description

Saved in:

Bibliographic Details
Main Authors:	Roodsari, Mahboobe Sadeghipour, Ali Saber, Mohamad, Navabi, Zainalabedin
Format:	Conference Proceeding
Language:	English
Subjects:	Accelerator Adders Cascading Computer architecture Field programmable gate arrays FPGA implementation Hardware Logic gates LSTM Neural networks Parallel computing Pipeline processing Programmable logic device Reconfigurable hardware
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A hardware architecture for the implementation of LSTM neural networks that can be sized to the specific size of the problem is proposed here. Implementation of an LSTM application requires iteration of multiplications, additions, and the activation functions that operate on the stream of data inputs. To handle the iterations, the concept of bitslicing is done to cascade enough slices for an optimum performance depending on the problem size. In order to avoid a large linear array of MAC slices, which would require large adders, these slices are arranged into an n-dimensional structure. Such a structure forces the adder units to become slices of their own, which also operate concurrent with the rest of the hardware in a pipeline fashion. This paper presents this bitslice architecture that can become a fabric for a programmable general-purpose LSTM implementation. The paper also shows an FPGA implementation that uses an on-chip FPGA RAM for the LSTM required memory. The work is compared with other works not considering multidimensional structures, as well as one that considers multi-dimensional cascading. In both cases we show that our structure is faster and uses smaller adder structures.
ISSN:	2473-2117
DOI:	10.1109/DDECS50862.2020.9095614