Loading…
OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a larg...
Saved in:
Published in: | Sensors (Basel, Switzerland) Switzerland), 2021-12, Vol.21 (24), p.8271 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3 |
---|---|
cites | cdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3 |
container_end_page | |
container_issue | 24 |
container_start_page | 8271 |
container_title | Sensors (Basel, Switzerland) |
container_volume | 21 |
creator | Nguyen, Duy-Thanh Ho, Nhut-Minh Wong, Weng-Fai Chang, Ik-Joon |
description | With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation. |
doi_str_mv | 10.3390/s21248271 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_17a2e519339642ce95d90fdf059d0c90</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_17a2e519339642ce95d90fdf059d0c90</doaj_id><sourcerecordid>2612870243</sourcerecordid><originalsourceid>FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</originalsourceid><addsrcrecordid>eNpdks1uEzEURkcIREvbBS-ALLGBxYD_ZmKzqNSmCVQKitSma8tj30kdnHGxPZXy9p00IWpZ2Z99dHx1fYviI8HfGJP4e6KEckFH5E1xTDjlpaAUv32xPyo-pLTCmDLGxPviiHFZY1bJ48LNLyeLH2jelfkeyqnfoMtNhnIGj-DRJMYQ0SJq88d1S9QOYRxiBJO3UXcWXUHep6nufU7Ideiui-Cdbjygq5uL3-h2kzKs02nxrtU-wdl-PSnuppPF-Fc5m_-8Hl_MSsNrmUvNBW-gsY0WVU11i7FpmW0ZJiPOMGsaUelKYytZLRrSEAAjW0Gt1UbUhhl2UlzvvDbolXqIbq3jRgXt1PNBiEulY3bGgyIjTaEicmhizakBWVmJW9viSlpsJB5c5zvXQ9-swRroctT-lfT1Tefu1TI8KjHCgjIyCL7sBTH87SFltXbJgPe6g9AnRWtSEUJxvX3r83_oKvSxG1q1pehgpJwN1NcdZWJIKUJ7KIZgtR0GdRiGgf30svoD-e_32RO_x64C</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2612870243</pqid></control><display><type>article</type><title>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Nguyen, Duy-Thanh ; Ho, Nhut-Minh ; Wong, Weng-Fai ; Chang, Ik-Joon</creator><creatorcontrib>Nguyen, Duy-Thanh ; Ho, Nhut-Minh ; Wong, Weng-Fai ; Chang, Ik-Joon</creatorcontrib><description>With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.</description><identifier>ISSN: 1424-8220</identifier><identifier>EISSN: 1424-8220</identifier><identifier>DOI: 10.3390/s21248271</identifier><identifier>PMID: 34960359</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Architecture ; Buses ; Cosmic rays ; DDR5 ; DRAM chips ; Error correction ; Error correction & detection ; error correction codes ; Fault detection ; memory architecture ; memory management ; on-die ECC ; Radiation ; Reliability ; Retention ; Software</subject><ispartof>Sensors (Basel, Switzerland), 2021-12, Vol.21 (24), p.8271</ispartof><rights>2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 by the authors. 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</citedby><cites>FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</cites><orcidid>0000-0002-4281-2053 ; 0000-0002-8871-8695 ; 0000-0002-3864-8027 ; 0000-0003-3029-4268</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2612870243/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2612870243?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25731,27901,27902,36989,36990,44566,53766,53768,74869</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34960359$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Nguyen, Duy-Thanh</creatorcontrib><creatorcontrib>Ho, Nhut-Minh</creatorcontrib><creatorcontrib>Wong, Weng-Fai</creatorcontrib><creatorcontrib>Chang, Ik-Joon</creatorcontrib><title>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</title><title>Sensors (Basel, Switzerland)</title><addtitle>Sensors (Basel)</addtitle><description>With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.</description><subject>Architecture</subject><subject>Buses</subject><subject>Cosmic rays</subject><subject>DDR5</subject><subject>DRAM chips</subject><subject>Error correction</subject><subject>Error correction & detection</subject><subject>error correction codes</subject><subject>Fault detection</subject><subject>memory architecture</subject><subject>memory management</subject><subject>on-die ECC</subject><subject>Radiation</subject><subject>Reliability</subject><subject>Retention</subject><subject>Software</subject><issn>1424-8220</issn><issn>1424-8220</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpdks1uEzEURkcIREvbBS-ALLGBxYD_ZmKzqNSmCVQKitSma8tj30kdnHGxPZXy9p00IWpZ2Z99dHx1fYviI8HfGJP4e6KEckFH5E1xTDjlpaAUv32xPyo-pLTCmDLGxPviiHFZY1bJ48LNLyeLH2jelfkeyqnfoMtNhnIGj-DRJMYQ0SJq88d1S9QOYRxiBJO3UXcWXUHep6nufU7Ideiui-Cdbjygq5uL3-h2kzKs02nxrtU-wdl-PSnuppPF-Fc5m_-8Hl_MSsNrmUvNBW-gsY0WVU11i7FpmW0ZJiPOMGsaUelKYytZLRrSEAAjW0Gt1UbUhhl2UlzvvDbolXqIbq3jRgXt1PNBiEulY3bGgyIjTaEicmhizakBWVmJW9viSlpsJB5c5zvXQ9-swRroctT-lfT1Tefu1TI8KjHCgjIyCL7sBTH87SFltXbJgPe6g9AnRWtSEUJxvX3r83_oKvSxG1q1pehgpJwN1NcdZWJIKUJ7KIZgtR0GdRiGgf30svoD-e_32RO_x64C</recordid><startdate>20211210</startdate><enddate>20211210</enddate><creator>Nguyen, Duy-Thanh</creator><creator>Ho, Nhut-Minh</creator><creator>Wong, Weng-Fai</creator><creator>Chang, Ik-Joon</creator><general>MDPI AG</general><general>MDPI</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4281-2053</orcidid><orcidid>https://orcid.org/0000-0002-8871-8695</orcidid><orcidid>https://orcid.org/0000-0002-3864-8027</orcidid><orcidid>https://orcid.org/0000-0003-3029-4268</orcidid></search><sort><creationdate>20211210</creationdate><title>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</title><author>Nguyen, Duy-Thanh ; Ho, Nhut-Minh ; Wong, Weng-Fai ; Chang, Ik-Joon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Architecture</topic><topic>Buses</topic><topic>Cosmic rays</topic><topic>DDR5</topic><topic>DRAM chips</topic><topic>Error correction</topic><topic>Error correction & detection</topic><topic>error correction codes</topic><topic>Fault detection</topic><topic>memory architecture</topic><topic>memory management</topic><topic>on-die ECC</topic><topic>Radiation</topic><topic>Reliability</topic><topic>Retention</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Duy-Thanh</creatorcontrib><creatorcontrib>Ho, Nhut-Minh</creatorcontrib><creatorcontrib>Wong, Weng-Fai</creatorcontrib><creatorcontrib>Chang, Ik-Joon</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health & Medical Collection (Proquest)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Sensors (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Duy-Thanh</au><au>Ho, Nhut-Minh</au><au>Wong, Weng-Fai</au><au>Chang, Ik-Joon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</atitle><jtitle>Sensors (Basel, Switzerland)</jtitle><addtitle>Sensors (Basel)</addtitle><date>2021-12-10</date><risdate>2021</risdate><volume>21</volume><issue>24</issue><spage>8271</spage><pages>8271-</pages><issn>1424-8220</issn><eissn>1424-8220</eissn><abstract>With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>34960359</pmid><doi>10.3390/s21248271</doi><orcidid>https://orcid.org/0000-0002-4281-2053</orcidid><orcidid>https://orcid.org/0000-0002-8871-8695</orcidid><orcidid>https://orcid.org/0000-0002-3864-8027</orcidid><orcidid>https://orcid.org/0000-0003-3029-4268</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1424-8220 |
ispartof | Sensors (Basel, Switzerland), 2021-12, Vol.21 (24), p.8271 |
issn | 1424-8220 1424-8220 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_17a2e519339642ce95d90fdf059d0c90 |
source | Publicly Available Content Database; PubMed Central |
subjects | Architecture Buses Cosmic rays DDR5 DRAM chips Error correction Error correction & detection error correction codes Fault detection memory architecture memory management on-die ECC Radiation Reliability Retention Software |
title | OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T03%3A17%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OBET:%20On-the-Fly%20Byte-Level%20Error%20Tracking%20for%20Correcting%20and%20Detecting%20Faults%20in%20Unreliable%20DRAM%20Systems&rft.jtitle=Sensors%20(Basel,%20Switzerland)&rft.au=Nguyen,%20Duy-Thanh&rft.date=2021-12-10&rft.volume=21&rft.issue=24&rft.spage=8271&rft.pages=8271-&rft.issn=1424-8220&rft.eissn=1424-8220&rft_id=info:doi/10.3390/s21248271&rft_dat=%3Cproquest_doaj_%3E2612870243%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2612870243&rft_id=info:pmid/34960359&rfr_iscdi=true |