Loading…
MiniFloats on RISC-V Cores: ISA Extensions With Mixed-Precision Short Dot Products
Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employ...
Saved in:
Published in: | IEEE transactions on emerging topics in computing 2024, Vol.12 (4), p.1040-1055 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employed in a mixed-precision scenario and combined with rounding methods increasing the precision in compound additions, such as stochastic rounding. So far, hardware implementations supporting FP8 are mostly implemented within domain-specific accelerators. We propose two RISC-V instruction set architecture (ISA) extensions, enhancing respectively scalar and vector general-purpose cores with low and mixed-precision capabilities. The extensions support two 8-bit and two 16-bit FP formats and are based on dot-product instructions accumulating at higher precision. We develop a hardware unit supporting mixed-precision dot products and stochastic rounding and integrate it into an open-source floating-point unit (FPU). Finally, we integrate the enhanced FPU into a cluster of scalar cores, as well as a cluster of vector cores, and implement them in a 12 nm FinFET technology. The former achieves 575 GFLOPS/W on FP8-to-FP16 matrix multiplications at 0.8 V, 1.26 GHz; the latter reaches 860 GFLOPS/W at 0.8 V, 1.08 GHz, 1.93x higher efficiency than computing on FP16-to-FP32. |
---|---|
ISSN: | 2168-6750 2168-6750 |
DOI: | 10.1109/TETC.2024.3365354 |