Loading…

TRAM: An Open-Source Template-based Reconfigurable Architecture Modeling Framework

Coarse-grained reconfigurable architecture (CGRA) is a promising accelerator design choice due to its high performance and power efficiency in the computation or data-intensive application domains, such as security, multimedia, digital signal processing, machine learning, and high-performance comput...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiu, Yunhui, Cao, Yuhang, Dai, Yuan, Yin, Wenbo, Wang, Lingli
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Coarse-grained reconfigurable architecture (CGRA) is a promising accelerator design choice due to its high performance and power efficiency in the computation or data-intensive application domains, such as security, multimedia, digital signal processing, machine learning, and high-performance computing. CGRA consists of coarse-grained processing elements (PEs) and interconnects that determine the architecture flexibility to support different applications and also affect the performance and power efficiency significantly. Although multiple types of interconnects have been proposed, a parameterized unified model is still lacking. In this paper, we propose a flexible and scalable CGRA template with a novel interconnect model that can unify the typical neighbor-to-neighbor, switch-based, and FPGA-like interconnects. Furthermore, we present TRAM, an open-source template-based reconfigurable architecture modeling framework that integrates the Chisel-based CGRA modeling, architecture intermediate representation (IR) and Verilog generation, dataflow graph (DFG) mapping, simulation, and evaluation. The mapping flow contains graph-based placement and routing, critical-path-driven data synchronization, and simulated-annealing-based optimization. We evaluate the impacts of the rich design parameters, which demonstrate the significance of such a flexible template to facilitate architecture optimization. Compared with the related work, TRAM can achieve a 4.1× smaller DFG latency and a faster mapping speed for both the 8×8 and 16×16 CGRAs. Moreover, TRAM is able to attain an extremely high PE utilization of 94.4 % on average by architecture tuning.
ISSN:1946-1488
DOI:10.1109/FPL57034.2022.00021