Proceedings of Euro-Par 2021: Parallel Processing Workshops. Lecture Notes in Computer Science, vol 13098. Springer, Cham.
ISSN/ISBN: Not available at this time. DOI: 10.1007/978-3-031-06156-1_25
Abstract: Fault tolerance is a key challenge as high performance computing systems continue to increase component counts, individual component reliability decreases, and hardware and software complexity increases. To better understand the potential impacts of failures on next-generation systems, significant effort has been devoted to collecting, characterizing and analyzing failures on current systems. These studies require large volumes of data and complex analysis in an attempt to identify statistical properties of the failure data.
Bibtex:
@InProceedings{,
author="Ferreira, Kurt B.
and Levy, Scott",
editor="Chaves, Ricardo
and B. Heras, Dora
and Ilic, Aleksandar
and Unat, Didem
and Badia, Rosa M.
and Bracciali, Andrea
and Diehl, Patrick
and Dubey, Anshu
and Sangyoon, Oh
and L. Scott, Stephen
and Ricci, Laura",
title="Characterizing Memory Failures Using Benford's Law",
booktitle="Euro-Par 2021: Parallel Processing Workshops",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="310--321",
isbn="978-3-031-06156-1"
}
Reference Type: Conference Paper
Subject Area(s): Computer Science