Loading…
A floating point conversion algorithm for mixed precision computations
The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numb...
Saved in:
Published in: | Frontiers of information technology & electronic engineering 2012-09, Vol.13 (9), p.711-718 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c319t-4b2ccadbe8e5183566fbf1a920b01916da5140e85c42b0f7c49ea7cd0861262e3 |
container_end_page | 718 |
container_issue | 9 |
container_start_page | 711 |
container_title | Frontiers of information technology & electronic engineering |
container_volume | 13 |
creator | Hoo, Choon Lih Haris, Sallehuddin Mohamed Mohamed, Nik Abdullah Nik |
description | The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted. |
doi_str_mv | 10.1631/jzus.C1200043 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1323230052</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>43272238</cqvip_id><sourcerecordid>2918723584</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-4b2ccadbe8e5183566fbf1a920b01916da5140e85c42b0f7c49ea7cd0861262e3</originalsourceid><addsrcrecordid>eNp1kMFLwzAUxosoOOaO3itevHTmJU3WHsdwKgy8KHgLaZp2GW3SJa2of72ZmxMEeYf34P2-9z2-KLoENAVG4HbzOfjpAjBCKCUn0QgylieQs9fT40zhPJp4vwkIIpTmjIyi5TyuGit6beq4s9r0sbTmTTmvrYlFU1un-3UbV9bFrX5XZdw5JfX3Vtq2G_ogtcZfRGeVaLyaHPo4elnePS8ektXT_eNivkokgbxP0gJLKcpCZYpCRihjVVGByDEqEOTASkEhRSqjMsUFqmYyzZWYyRJlDDDDioyjm_3dztntoHzPW-2lahphlB08B4JDIURxQK__oBs7OBO-4ziHbIYJzdJAJXtKOuu9UxXvnG6F--CA-C5YvguW_wQb-Ome94EztXK_V_8TXB0M1tbU26A5OqQEzzAmGfkCkq2GWQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918723584</pqid></control><display><type>article</type><title>A floating point conversion algorithm for mixed precision computations</title><source>Springer Nature</source><creator>Hoo, Choon Lih ; Haris, Sallehuddin Mohamed ; Mohamed, Nik Abdullah Nik</creator><creatorcontrib>Hoo, Choon Lih ; Haris, Sallehuddin Mohamed ; Mohamed, Nik Abdullah Nik</creatorcontrib><description>The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.</description><identifier>ISSN: 1869-1951</identifier><identifier>ISSN: 2095-9184</identifier><identifier>EISSN: 1869-196X</identifier><identifier>EISSN: 2095-9230</identifier><identifier>DOI: 10.1631/jzus.C1200043</identifier><language>eng</language><publisher>Heidelberg: SP Zhejiang University Press</publisher><subject>Algorithms ; Chips ; Communications Engineering ; Computation ; Computer Hardware ; Computer Science ; Computer Systems Organization and Communication Networks ; Conversion ; DSP芯片 ; Electrical Engineering ; Electronics and Microelectronics ; Field programmable gate arrays ; Floating point arithmetic ; Instrumentation ; Logic analysers ; Networks ; Quartus ; Real numbers ; Representations ; Verilog代码 ; 浮点数 ; 混合 ; 现场可编程门阵列 ; 精度计算 ; 转换算法</subject><ispartof>Frontiers of information technology & electronic engineering, 2012-09, Vol.13 (9), p.711-718</ispartof><rights>Journal of Zhejiang University Science Editorial Office and Springer-Verlag Berlin Heidelberg 2012</rights><rights>Journal of Zhejiang University Science Editorial Office and Springer-Verlag Berlin Heidelberg 2012.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c319t-4b2ccadbe8e5183566fbf1a920b01916da5140e85c42b0f7c49ea7cd0861262e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/89589X/89589X.jpg</thumbnail><link.rule.ids>314,776,780,27900,27901</link.rule.ids></links><search><creatorcontrib>Hoo, Choon Lih</creatorcontrib><creatorcontrib>Haris, Sallehuddin Mohamed</creatorcontrib><creatorcontrib>Mohamed, Nik Abdullah Nik</creatorcontrib><title>A floating point conversion algorithm for mixed precision computations</title><title>Frontiers of information technology & electronic engineering</title><addtitle>J. Zhejiang Univ. - Sci. C</addtitle><addtitle>Journal of zhejiang university science</addtitle><description>The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.</description><subject>Algorithms</subject><subject>Chips</subject><subject>Communications Engineering</subject><subject>Computation</subject><subject>Computer Hardware</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Conversion</subject><subject>DSP芯片</subject><subject>Electrical Engineering</subject><subject>Electronics and Microelectronics</subject><subject>Field programmable gate arrays</subject><subject>Floating point arithmetic</subject><subject>Instrumentation</subject><subject>Logic analysers</subject><subject>Networks</subject><subject>Quartus</subject><subject>Real numbers</subject><subject>Representations</subject><subject>Verilog代码</subject><subject>浮点数</subject><subject>混合</subject><subject>现场可编程门阵列</subject><subject>精度计算</subject><subject>转换算法</subject><issn>1869-1951</issn><issn>2095-9184</issn><issn>1869-196X</issn><issn>2095-9230</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNp1kMFLwzAUxosoOOaO3itevHTmJU3WHsdwKgy8KHgLaZp2GW3SJa2of72ZmxMEeYf34P2-9z2-KLoENAVG4HbzOfjpAjBCKCUn0QgylieQs9fT40zhPJp4vwkIIpTmjIyi5TyuGit6beq4s9r0sbTmTTmvrYlFU1un-3UbV9bFrX5XZdw5JfX3Vtq2G_ogtcZfRGeVaLyaHPo4elnePS8ektXT_eNivkokgbxP0gJLKcpCZYpCRihjVVGByDEqEOTASkEhRSqjMsUFqmYyzZWYyRJlDDDDioyjm_3dztntoHzPW-2lahphlB08B4JDIURxQK__oBs7OBO-4ziHbIYJzdJAJXtKOuu9UxXvnG6F--CA-C5YvguW_wQb-Ome94EztXK_V_8TXB0M1tbU26A5OqQEzzAmGfkCkq2GWQ</recordid><startdate>20120901</startdate><enddate>20120901</enddate><creator>Hoo, Choon Lih</creator><creator>Haris, Sallehuddin Mohamed</creator><creator>Mohamed, Nik Abdullah Nik</creator><general>SP Zhejiang University Press</general><general>Springer Nature B.V</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>KR7</scope></search><sort><creationdate>20120901</creationdate><title>A floating point conversion algorithm for mixed precision computations</title><author>Hoo, Choon Lih ; Haris, Sallehuddin Mohamed ; Mohamed, Nik Abdullah Nik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-4b2ccadbe8e5183566fbf1a920b01916da5140e85c42b0f7c49ea7cd0861262e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Chips</topic><topic>Communications Engineering</topic><topic>Computation</topic><topic>Computer Hardware</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Conversion</topic><topic>DSP芯片</topic><topic>Electrical Engineering</topic><topic>Electronics and Microelectronics</topic><topic>Field programmable gate arrays</topic><topic>Floating point arithmetic</topic><topic>Instrumentation</topic><topic>Logic analysers</topic><topic>Networks</topic><topic>Quartus</topic><topic>Real numbers</topic><topic>Representations</topic><topic>Verilog代码</topic><topic>浮点数</topic><topic>混合</topic><topic>现场可编程门阵列</topic><topic>精度计算</topic><topic>转换算法</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hoo, Choon Lih</creatorcontrib><creatorcontrib>Haris, Sallehuddin Mohamed</creatorcontrib><creatorcontrib>Mohamed, Nik Abdullah Nik</creatorcontrib><collection>维普_期刊</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>维普中文期刊数据库</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Civil Engineering Abstracts</collection><jtitle>Frontiers of information technology & electronic engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hoo, Choon Lih</au><au>Haris, Sallehuddin Mohamed</au><au>Mohamed, Nik Abdullah Nik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A floating point conversion algorithm for mixed precision computations</atitle><jtitle>Frontiers of information technology & electronic engineering</jtitle><stitle>J. Zhejiang Univ. - Sci. C</stitle><addtitle>Journal of zhejiang university science</addtitle><date>2012-09-01</date><risdate>2012</risdate><volume>13</volume><issue>9</issue><spage>711</spage><epage>718</epage><pages>711-718</pages><issn>1869-1951</issn><issn>2095-9184</issn><eissn>1869-196X</eissn><eissn>2095-9230</eissn><abstract>The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.</abstract><cop>Heidelberg</cop><pub>SP Zhejiang University Press</pub><doi>10.1631/jzus.C1200043</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1869-1951 |
ispartof | Frontiers of information technology & electronic engineering, 2012-09, Vol.13 (9), p.711-718 |
issn | 1869-1951 2095-9184 1869-196X 2095-9230 |
language | eng |
recordid | cdi_proquest_miscellaneous_1323230052 |
source | Springer Nature |
subjects | Algorithms Chips Communications Engineering Computation Computer Hardware Computer Science Computer Systems Organization and Communication Networks Conversion DSP芯片 Electrical Engineering Electronics and Microelectronics Field programmable gate arrays Floating point arithmetic Instrumentation Logic analysers Networks Quartus Real numbers Representations Verilog代码 浮点数 混合 现场可编程门阵列 精度计算 转换算法 |
title | A floating point conversion algorithm for mixed precision computations |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-24T10%3A00%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20floating%20point%20conversion%20algorithm%20for%20mixed%20precision%20computations&rft.jtitle=Frontiers%20of%20information%20technology%20&%20electronic%20engineering&rft.au=Hoo,%20Choon%20Lih&rft.date=2012-09-01&rft.volume=13&rft.issue=9&rft.spage=711&rft.epage=718&rft.pages=711-718&rft.issn=1869-1951&rft.eissn=1869-196X&rft_id=info:doi/10.1631/jzus.C1200043&rft_dat=%3Cproquest_cross%3E2918723584%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-4b2ccadbe8e5183566fbf1a920b01916da5140e85c42b0f7c49ea7cd0861262e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2918723584&rft_id=info:pmid/&rft_cqvip_id=43272238&rfr_iscdi=true |