Loading…
Toward Energy-Efficient Stochastic Circuits Using Parallel Sobol Sequences
Stochastic computing (SC) often requires long stochastic sequences and, thus, a long latency to achieve accurate computation. The long latency leads to an inferior performance and low energy efficiency compared with most conventional binary designs. In this paper, a type of low-discrepancy sequences...
Saved in:
Published in: | IEEE transactions on very large scale integration (VLSI) systems 2018-07, Vol.26 (7), p.1326-1339 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Stochastic computing (SC) often requires long stochastic sequences and, thus, a long latency to achieve accurate computation. The long latency leads to an inferior performance and low energy efficiency compared with most conventional binary designs. In this paper, a type of low-discrepancy sequences, the Sobol sequence, is considered for use in SC. Compared to the use of pseudorandom sequences generated by linear feedback shift registers (LFSRs), the use of Sobol sequences improves the accuracy of stochastic computation with a reduced sequence length. The inherent feature in Sobol sequence generators enables the parallel implementation of random number generators with an improved performance and hardware efficiency. In particular, the underlying theory is formulated and circuit design is proposed for an arbitrary level of parallelization in a power of 2. In addition, different strategies are implemented for parallelizing combinational and sequential stochastic circuits. The hardware efficiency of the parallel stochastic circuits is measured by energy per operation (EPO), throughput per area (TPA), and runtime. At a similar accuracy, the 8{\times} parallel stochastic circuits using Sobol sequences consume approximately 1% of the EPO of the conventional LFSR-based nonparallelized circuits. Meanwhile, an average of 70 (up to 89) times improvements in TPA and less than 1% runtime are achieved. A sorting network is implemented for a median filter (MF) as an application. For a similar image processing quality, a higher energy efficiency is obtained for an 8{\times} parallelized stochastic MF compared with its binary counterpart. |
---|---|
ISSN: | 1063-8210 1557-9999 |
DOI: | 10.1109/TVLSI.2018.2812214 |