Loading…

FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism

Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open source works often lack integration of CGRAs with CPU systems and corresponding toolchains. Moreover, there is rare support for the a...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on reconfigurable technology and systems 2024-03, Vol.17 (1), p.1-26, Article 4
Main Authors: Qiu, Yunhui, Mao, Yiqing, Gao, Xuchen, Chen, Sichao, Li, Jiangnan, Yin, Wenbo, Wang, Lingli
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open source works often lack integration of CGRAs with CPU systems and corresponding toolchains. Moreover, there is rare support for the accelerator instruction pipelining to overlap data communication, computation, and configuration across multiple tasks. In this article, we propose FDRA, an open source exploration framework for a heterogeneous system-on-chip (SoC) with a RISC-V processor and a dynamically reconfigurable accelerator (DRA) supporting loop, instruction, and task levels of parallelism. FDRA encompasses parameterized SoC modeling, Verilog generation, source-to-source application code transformation using frontend and DRA compilers, SoC simulation, and FPGA prototyping. FDRA incorporates the extraction of periodic accumulative operators and multi-dimensional linear load/store operators from nested loops. The DRA enables accessing the shared L2 cache with virtual addresses and supports direct memory access with arbitrary start addresses and data lengths. Integrated into the RISC-V Rocket SoC, our DRA achieves a remarkable 55× acceleration for loop kernels and improves energy efficiency by 29×. Compared to state-of-the-art RISC-V vector units, our DRA demonstrates a 2.9× speed improvement and 3.5× greater energy efficiency. In contrast to previous CGRA+RISC-V SoCs, our SoC achieves a minimum speedup of 5.2×.
ISSN:1936-7406
1936-7414
DOI:10.1145/3614224