Loading…

OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems

With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a larg...

Full description

Saved in:
Bibliographic Details
Published in:Sensors (Basel, Switzerland) Switzerland), 2021-12, Vol.21 (24), p.8271
Main Authors: Nguyen, Duy-Thanh, Ho, Nhut-Minh, Wong, Weng-Fai, Chang, Ik-Joon
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3
cites cdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3
container_end_page
container_issue 24
container_start_page 8271
container_title Sensors (Basel, Switzerland)
container_volume 21
creator Nguyen, Duy-Thanh
Ho, Nhut-Minh
Wong, Weng-Fai
Chang, Ik-Joon
description With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.
doi_str_mv 10.3390/s21248271
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_17a2e519339642ce95d90fdf059d0c90</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_17a2e519339642ce95d90fdf059d0c90</doaj_id><sourcerecordid>2612870243</sourcerecordid><originalsourceid>FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</originalsourceid><addsrcrecordid>eNpdks1uEzEURkcIREvbBS-ALLGBxYD_ZmKzqNSmCVQKitSma8tj30kdnHGxPZXy9p00IWpZ2Z99dHx1fYviI8HfGJP4e6KEckFH5E1xTDjlpaAUv32xPyo-pLTCmDLGxPviiHFZY1bJ48LNLyeLH2jelfkeyqnfoMtNhnIGj-DRJMYQ0SJq88d1S9QOYRxiBJO3UXcWXUHep6nufU7Ideiui-Cdbjygq5uL3-h2kzKs02nxrtU-wdl-PSnuppPF-Fc5m_-8Hl_MSsNrmUvNBW-gsY0WVU11i7FpmW0ZJiPOMGsaUelKYytZLRrSEAAjW0Gt1UbUhhl2UlzvvDbolXqIbq3jRgXt1PNBiEulY3bGgyIjTaEicmhizakBWVmJW9viSlpsJB5c5zvXQ9-swRroctT-lfT1Tefu1TI8KjHCgjIyCL7sBTH87SFltXbJgPe6g9AnRWtSEUJxvX3r83_oKvSxG1q1pehgpJwN1NcdZWJIKUJ7KIZgtR0GdRiGgf30svoD-e_32RO_x64C</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2612870243</pqid></control><display><type>article</type><title>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Nguyen, Duy-Thanh ; Ho, Nhut-Minh ; Wong, Weng-Fai ; Chang, Ik-Joon</creator><creatorcontrib>Nguyen, Duy-Thanh ; Ho, Nhut-Minh ; Wong, Weng-Fai ; Chang, Ik-Joon</creatorcontrib><description>With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.</description><identifier>ISSN: 1424-8220</identifier><identifier>EISSN: 1424-8220</identifier><identifier>DOI: 10.3390/s21248271</identifier><identifier>PMID: 34960359</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Architecture ; Buses ; Cosmic rays ; DDR5 ; DRAM chips ; Error correction ; Error correction &amp; detection ; error correction codes ; Fault detection ; memory architecture ; memory management ; on-die ECC ; Radiation ; Reliability ; Retention ; Software</subject><ispartof>Sensors (Basel, Switzerland), 2021-12, Vol.21 (24), p.8271</ispartof><rights>2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 by the authors. 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</citedby><cites>FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</cites><orcidid>0000-0002-4281-2053 ; 0000-0002-8871-8695 ; 0000-0002-3864-8027 ; 0000-0003-3029-4268</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2612870243/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2612870243?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25731,27901,27902,36989,36990,44566,53766,53768,74869</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34960359$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Nguyen, Duy-Thanh</creatorcontrib><creatorcontrib>Ho, Nhut-Minh</creatorcontrib><creatorcontrib>Wong, Weng-Fai</creatorcontrib><creatorcontrib>Chang, Ik-Joon</creatorcontrib><title>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</title><title>Sensors (Basel, Switzerland)</title><addtitle>Sensors (Basel)</addtitle><description>With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.</description><subject>Architecture</subject><subject>Buses</subject><subject>Cosmic rays</subject><subject>DDR5</subject><subject>DRAM chips</subject><subject>Error correction</subject><subject>Error correction &amp; detection</subject><subject>error correction codes</subject><subject>Fault detection</subject><subject>memory architecture</subject><subject>memory management</subject><subject>on-die ECC</subject><subject>Radiation</subject><subject>Reliability</subject><subject>Retention</subject><subject>Software</subject><issn>1424-8220</issn><issn>1424-8220</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpdks1uEzEURkcIREvbBS-ALLGBxYD_ZmKzqNSmCVQKitSma8tj30kdnHGxPZXy9p00IWpZ2Z99dHx1fYviI8HfGJP4e6KEckFH5E1xTDjlpaAUv32xPyo-pLTCmDLGxPviiHFZY1bJ48LNLyeLH2jelfkeyqnfoMtNhnIGj-DRJMYQ0SJq88d1S9QOYRxiBJO3UXcWXUHep6nufU7Ideiui-Cdbjygq5uL3-h2kzKs02nxrtU-wdl-PSnuppPF-Fc5m_-8Hl_MSsNrmUvNBW-gsY0WVU11i7FpmW0ZJiPOMGsaUelKYytZLRrSEAAjW0Gt1UbUhhl2UlzvvDbolXqIbq3jRgXt1PNBiEulY3bGgyIjTaEicmhizakBWVmJW9viSlpsJB5c5zvXQ9-swRroctT-lfT1Tefu1TI8KjHCgjIyCL7sBTH87SFltXbJgPe6g9AnRWtSEUJxvX3r83_oKvSxG1q1pehgpJwN1NcdZWJIKUJ7KIZgtR0GdRiGgf30svoD-e_32RO_x64C</recordid><startdate>20211210</startdate><enddate>20211210</enddate><creator>Nguyen, Duy-Thanh</creator><creator>Ho, Nhut-Minh</creator><creator>Wong, Weng-Fai</creator><creator>Chang, Ik-Joon</creator><general>MDPI AG</general><general>MDPI</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4281-2053</orcidid><orcidid>https://orcid.org/0000-0002-8871-8695</orcidid><orcidid>https://orcid.org/0000-0002-3864-8027</orcidid><orcidid>https://orcid.org/0000-0003-3029-4268</orcidid></search><sort><creationdate>20211210</creationdate><title>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</title><author>Nguyen, Duy-Thanh ; Ho, Nhut-Minh ; Wong, Weng-Fai ; Chang, Ik-Joon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Architecture</topic><topic>Buses</topic><topic>Cosmic rays</topic><topic>DDR5</topic><topic>DRAM chips</topic><topic>Error correction</topic><topic>Error correction &amp; detection</topic><topic>error correction codes</topic><topic>Fault detection</topic><topic>memory architecture</topic><topic>memory management</topic><topic>on-die ECC</topic><topic>Radiation</topic><topic>Reliability</topic><topic>Retention</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Duy-Thanh</creatorcontrib><creatorcontrib>Ho, Nhut-Minh</creatorcontrib><creatorcontrib>Wong, Weng-Fai</creatorcontrib><creatorcontrib>Chang, Ik-Joon</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection (Proquest)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Sensors (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Duy-Thanh</au><au>Ho, Nhut-Minh</au><au>Wong, Weng-Fai</au><au>Chang, Ik-Joon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems</atitle><jtitle>Sensors (Basel, Switzerland)</jtitle><addtitle>Sensors (Basel)</addtitle><date>2021-12-10</date><risdate>2021</risdate><volume>21</volume><issue>24</issue><spage>8271</spage><pages>8271-</pages><issn>1424-8220</issn><eissn>1424-8220</eissn><abstract>With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>34960359</pmid><doi>10.3390/s21248271</doi><orcidid>https://orcid.org/0000-0002-4281-2053</orcidid><orcidid>https://orcid.org/0000-0002-8871-8695</orcidid><orcidid>https://orcid.org/0000-0002-3864-8027</orcidid><orcidid>https://orcid.org/0000-0003-3029-4268</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1424-8220
ispartof Sensors (Basel, Switzerland), 2021-12, Vol.21 (24), p.8271
issn 1424-8220
1424-8220
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_17a2e519339642ce95d90fdf059d0c90
source Publicly Available Content Database; PubMed Central
subjects Architecture
Buses
Cosmic rays
DDR5
DRAM chips
Error correction
Error correction & detection
error correction codes
Fault detection
memory architecture
memory management
on-die ECC
Radiation
Reliability
Retention
Software
title OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T03%3A17%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OBET:%20On-the-Fly%20Byte-Level%20Error%20Tracking%20for%20Correcting%20and%20Detecting%20Faults%20in%20Unreliable%20DRAM%20Systems&rft.jtitle=Sensors%20(Basel,%20Switzerland)&rft.au=Nguyen,%20Duy-Thanh&rft.date=2021-12-10&rft.volume=21&rft.issue=24&rft.spage=8271&rft.pages=8271-&rft.issn=1424-8220&rft.eissn=1424-8220&rft_id=info:doi/10.3390/s21248271&rft_dat=%3Cproquest_doaj_%3E2612870243%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c469t-a484bebdba8562af00cf3df30174303bb85a5a0d9368b1b1eec9f82ddac86c3c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2612870243&rft_id=info:pmid/34960359&rfr_iscdi=true