Loading…

A 101.4 GOPS/W Reconfigurable and Scalable Control-Centric Embedded Processor for Domain-Specific Applications

Adapting the processor to the target application is essential in the Internet-of-Things (IoT), and thus requires customizability in order to improve energy efficiency and scalability to provide sufficient performance. In this paper, a reconfigurable and scalable control-centric architecture is propo...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2016-12, Vol.63 (12), p.2245-2256
Main Authors: Huan, Yuxiang, Ma, Ning, Mao, Jia, Blixt, Stefan, Lu, Zhonghai, Zou, Zhuo, Zheng, Li-Rong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Adapting the processor to the target application is essential in the Internet-of-Things (IoT), and thus requires customizability in order to improve energy efficiency and scalability to provide sufficient performance. In this paper, a reconfigurable and scalable control-centric architecture is proposed, and a processor consisting of two cores and an on-chip multi-mode router is implemented. Reconfigurability is enabled by a programmable sequence mapping table (SMT) which reorganizes functional units in each cycle, thus increasing hardware utilization and reducing excessive data movement for high energy efficiency. The router facilitates both wormhole and circuit switching to construct intra- or inter-chip interconnections, providing scalable performance. Fabricated in a 65-nm process, the chip exhibits 101.4 GOPS/W energy efficiency with a die size of 3.5 mm 2 . The processor carries out general-purpose processing with a code size 29% smaller than the ARM Cortex M4, and improves the performance of application-specific processing by over ten times when implementing AES and RSA using SMTs instead of general-purpose C. By utilizing the on-chip router, the processor can be interconnected up to 256 nodes, with a single link bandwidth of 1.4 Gbps.
ISSN:1549-8328
1558-0806
1558-0806
DOI:10.1109/TCSI.2016.2616363