Loading…

MiniFloats on RISC-V Cores: ISA Extensions With Mixed-Precision Short Dot Products

Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employ...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on emerging topics in computing 2024, Vol.12 (4), p.1040-1055
Main Authors: Bertaccini, Luca, Paulin, Gianna, Cavalcante, Matheus, Fischer, Tim, Mach, Stefan, Benini, Luca
Format: Article
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employed in a mixed-precision scenario and combined with rounding methods increasing the precision in compound additions, such as stochastic rounding. So far, hardware implementations supporting FP8 are mostly implemented within domain-specific accelerators. We propose two RISC-V instruction set architecture (ISA) extensions, enhancing respectively scalar and vector general-purpose cores with low and mixed-precision capabilities. The extensions support two 8-bit and two 16-bit FP formats and are based on dot-product instructions accumulating at higher precision. We develop a hardware unit supporting mixed-precision dot products and stochastic rounding and integrate it into an open-source floating-point unit (FPU). Finally, we integrate the enhanced FPU into a cluster of scalar cores, as well as a cluster of vector cores, and implement them in a 12 nm FinFET technology. The former achieves 575 GFLOPS/W on FP8-to-FP16 matrix multiplications at 0.8 V, 1.26 GHz; the latter reaches 860 GFLOPS/W at 0.8 V, 1.08 GHz, 1.93x higher efficiency than computing on FP16-to-FP32.
ISSN:2168-6750
2168-6750
DOI:10.1109/TETC.2024.3365354