Loading…

TetriX: Flexible Architecture and Optimal Mapping for Tensorized Neural Network Processing

The continuous growth of deep neural network model size and complexity hinders the adoption of large models in resource-constrained platforms. Tensor decomposition has been shown effective in reducing the model size by large compression ratios, but the resulting tensorized neural networks (TNNs) req...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on computers 2024-05, Vol.73 (5), p.1219-1232
Main Authors:	Zhang, Jie-Fang, Lu, Cheng-Hsun, Zhang, Zhengya
Format:	Article
Language:	English
Subjects:	Artificial neural networks Co-design Complexity Complexity theory Compression ratio Computational modeling Computer architecture Computers Decomposition Hardware Image coding Mapping Mathematical analysis Matrix decomposition Network latency neural network accelerator Neural networks tensor contraction sequence search Tensor decomposition tensorized neural network (TNN) Tensors Workload Workloads
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The continuous growth of deep neural network model size and complexity hinders the adoption of large models in resource-constrained platforms. Tensor decomposition has been shown effective in reducing the model size by large compression ratios, but the resulting tensorized neural networks (TNNs) require complex and versatile tensor shaping for tensor contraction, causing a low processing efficiency for existing hardware architectures. This work presents TetriX, a co-design of flexible architecture and optimal workload mapping for efficient and flexible TNN processing. TetriX adopts a unified processing architecture to support both inner and outer product. A hybrid mapping scheme is proposed to eliminate complex tensor shaping by alternating between inner and outer product in a sequence of tensor contractions. Finally, a mapping-aware contraction sequence search (MCSS) is proposed to identify the contraction sequence and workload mapping for achieving the optimal latency on TetriX. Remarkably, combining TetriX with MCSS outperforms the single-mode inner-product and outer-product baselines by up to 46.8\boldsymbol{\times} × in performance across the collected TNN workloads. TetriX is the first work to support all existing tensor decomposition methods. Compared to a TNN accelerator designed for the hierarchical Tucker method, TetriX achieves improvements of 6.5\boldsymbol{\times} × and 1.1\boldsymbol{\times} × in inference throughput and efficiency, respectively.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2024.3365936