Loading…
SCC based modulo scheduling for coarse-grained reconfigurable processors
Coarse-grained reconfigurable arrays (CGRAs) architectures aim to offer high performance at low power consumption, especially for digital signal processing and streaming applications. To fully exploit the computing capability of CGRA, it is essential to develop a scheduling algorithm which maps oper...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Coarse-grained reconfigurable arrays (CGRAs) architectures aim to offer high performance at low power consumption, especially for digital signal processing and streaming applications. To fully exploit the computing capability of CGRA, it is essential to develop a scheduling algorithm which maps operations over processing elements in CGRA. Modulo scheduling [1] is known as the state-of-art algorithm for CGRA scheduling, and there are many variants [2][3][4]. However, they suffer from dealing with inter-iteration dependences called as recurrences that form cyclic dependences. Hence we propose a new scheduling technique that efficiently handles the cyclic dependences. The key techniques are grouping all the mutually-dependent recurrence cycles into a strongly connected component (SCC), and scheduling the input data flow graph (DFG) based on SCCs. Since grouping removes all the recurrence cycles from DFG, the resulting SCC graph becomes a form of directed acyclic graph (DAG) in which the scheduler can track the total order of SCCs. While processing SCCs one by one, our intra-SCC scheduler analyzes the dependences between every pair of two different operations inside of SCC and produces the schedule of them. Thanks to the well-structured form of the SCC-based graph, we obtain more efficient schedule compared to the previous CGRA scheduling algorithm [2]. The experimental results show that the proposed technique enhances the performance of recurrence-dominant loops up to 3.5X and raises the success rate of modulo-scheduling compared to the previous CGRA scheduling algorithm [2]. |
---|---|
DOI: | 10.1109/FPT.2012.6412156 |