Loading…

R-HLS: An IR for Dynamic High-Level Synthesis and Memory Disambiguation based on Regions and State Edges

Dynamically scheduled hardware enables high-level synthesis (HLS) for applications with irregular control flow and latencies, which perform poorly with conventional statically scheduled approaches. Since dynamically scheduled hardware is inherently data flow based, it is beneficial to have an interm...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-08
Main Authors:	Metz, David, Reissmann, Nico, Själander, Magnus
Format:	Article
Language:	English
Subjects:	Distributed memory Graph theory Hardware High level synthesis Lookup tables Resource utilization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Dynamically scheduled hardware enables high-level synthesis (HLS) for applications with irregular control flow and latencies, which perform poorly with conventional statically scheduled approaches. Since dynamically scheduled hardware is inherently data flow based, it is beneficial to have an intermediate representation (IR) that captures the global data flow to enable easier transformations. State-of-the-art dynamic HLS utilize control flow based IRs, which model data flow only at the basic block level, requiring the rediscovery of inter-block parallelism. The Regionalized Value State Dependence Graph (RVSDG) is an IR that models (1) control flow as part of the global data flow utilizing regions and (2) memory dependencies using state edges. We propose R-HLS, a new RVSDG dialect targeted for dynamic high-level synthesis. R-HLS explicitly models control flow decisions, routing, and memory, which are only abstractly represented in the RVSDG. Expressing the control flow as part of the data flow reduces the need for complex optimizations to extract performance and enables easy conversion to parallel circuits. Furthermore, we present a distributed memory disambiguation optimization that leverages memory state edges to decouple address generation from data accesses, resulting in resource efficient out-of-program-order execution of memory operations. Our results show that R-HLS effectively exposes parallelism, resulting in fewer executed cycles and a 10% speedup on average, compared to the state-of-the-art in dynamic HLS with optimized memory disambiguation. These results are achieved with a significant reduction in resource utilization, such as a 79% reduction in lookup-tables and 22% reduction in flip-flops, on average.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2408.08712