Loading…

Response of HPC hardware to neutron radiation at the dawn of exascale

Every computation presents a small chance that an unexpected phenomenon ruins or modifies its output. Computers are prone to errors that, although may be very unlikely, are hard, expensive or simply impossible to avoid. In the exascale, with thousands of processors involved in a single computation,...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2023-08, Vol.79 (12), p.13817-13838
Main Authors: Bustos, Andrés, Rubio-Montero, Antonio Juan, Méndez, Roberto, Rivera, Sergio, González, Francisco, Campo, Xandra, Asorey, Hernán, Mayo-García, Rafael
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Every computation presents a small chance that an unexpected phenomenon ruins or modifies its output. Computers are prone to errors that, although may be very unlikely, are hard, expensive or simply impossible to avoid. In the exascale, with thousands of processors involved in a single computation, those errors are especially harmful because they can corrupt or distort the results, wasting human and material resources. In the present work, we study the effect of ionizing radiation on several pieces of commercial hardware, very common in modern supercomputers. Aiming to reproduce the natural radiation that could arise, CPUs (Xeon, EPYC) and GPUs (A100, V100, T4) are subject to a known flux of neutrons coming from two radioactive sources, namely 252 Cf and 241 Am-Be, in a special irradiation facility. The working hardware is irradiated under supervision to quantify any appearing error. Once the hardware response is characterised, we are able to scale down the radiation intensity and to estimate the effects on standard data centres. This can help administrators and researchers to develop their contingency plans and protocols.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-023-05199-y