Loading…
A 1 GHz Hardware Loop-Accelerator With Razor-Based Dynamic Adaptation for Energy-Efficient Operation
Dynamic adaptation using Razor-based detection and correction of timing errors has demonstrated substantial improvements in performance and energy-efficiency in microprocessors. In this work, we apply Razor to hardware accelerators that find increasing application in System-on-Chip designs with high...
Saved in:
Published in: | IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2014-08, Vol.61 (8), p.2290-2298 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Dynamic adaptation using Razor-based detection and correction of timing errors has demonstrated substantial improvements in performance and energy-efficiency in microprocessors. In this work, we apply Razor to hardware accelerators that find increasing application in System-on-Chip designs with high-performance requirements that must be delivered under stringent power budgets. We describe the implementation and silicon measurement results from a Razor-based hardware loop-accelerator (RZLA), implementing the Sobel edge-detection algorithm. Unlike in microprocessors, the RZLA pipeline is datapath-dominated with statically-scheduled control that has queue-based storage structures which are simply extended to support check-pointing and recovery. We exploit these characteristics typical of DSP and image-processing accelerators to implement Razor recovery in manner that is amenable to RTL validation and verification. We show a low-overhead pulsed-latch based Razor Flip-flop (RFF) architecture that adds only a single extra transistor on clock to minimize clock power overhead. The RFF is deployed in conjunction with a level-sensitive latch-insertion based algorithm to address the minimum-delay constraint present in all Razor systems. This algorithm enables the use of 50% of the clock period for timing speculation leading to robust error detection and correction across a wide dynamic voltage- and frequency-scaling range. Fabricated in 65 nm CMOS, the RZLA reclaims voltage margins to demonstrate 34% energy-efficiency improvements on a per-device basis and 33% overall, for the entire batch of devices at 1 GHz operation. |
---|---|
ISSN: | 1549-8328 1558-0806 |
DOI: | 10.1109/TCSI.2014.2333332 |