Loading…

A floating point conversion algorithm for mixed precision computations

The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision （64-bit） representations allow for a wider range of real numb...

Full description

Saved in:

Bibliographic Details
Published in:	Frontiers of information technology & electronic engineering 2012-09, Vol.13 (9), p.711-718
Main Authors:	Hoo, Choon Lih, Haris, Sallehuddin Mohamed, Mohamed, Nik Abdullah Nik
Format:	Article
Language:	English
Subjects:	Algorithms Chips Communications Engineering Computation Computer Hardware Computer Science Computer Systems Organization and Communication Networks Conversion DSP芯片 Electrical Engineering Electronics and Microelectronics Field programmable gate arrays Floating point arithmetic Instrumentation Logic analysers Networks Quartus Real numbers Representations Verilog代码浮点数混合现场可编程门阵列精度计算转换算法
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision （64-bit） representations allow for a wider range of real numbers to be denoted. However, single precision （32-bit） operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array （FPGA） using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.
ISSN:	1869-1951 2095-9184 1869-196X 2095-9230
DOI:	10.1631/jzus.C1200043