Loading…

Analysis Grand Challenge benchmarking tests on selected sites

A fast turn-around time and ease of use are important factors for systems supporting the analysis of large HEP data samples. We study and compare multiple technical approaches. This article will be about setting up and benchmarking the Analysis Grand Challenge (AGC) [1] using CMS Open Data. The AGC...

Full description

Saved in:
Bibliographic Details
Published in:EPJ Web of conferences 2024, Vol.295, p.4006
Main Authors: Koch, David, Kuhr, Thomas, Duckeck, Günter, Hartmann, Nikolai
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A fast turn-around time and ease of use are important factors for systems supporting the analysis of large HEP data samples. We study and compare multiple technical approaches. This article will be about setting up and benchmarking the Analysis Grand Challenge (AGC) [1] using CMS Open Data. The AGC is an effort to provide a realistic physics analysis with the intent of showcasing the functionality, scalability and feature-completeness of the Scikit-HEP Python ecosystem. We will present the results of setting up the necessary software environment for the AGC and benchmarking the analysis’ run time on various computing clusters: the institute SLURM cluster at LMU Munich, a SLURM cluster at LRZ (WLCG Tier-2 site) and the analysis facility Vispa [2], operated by RWTH Aachen. Each site provides slightly different software environments and modes of operation which poses interesting challenges on the flexibility of a setup like that intended for the AGC. Comparing these benchmarks to each other also provides insights about different storage and caching systems. At LRZ and LMU we have regular Grid storage (HDD) as well as an SSD-based XCache server and on Vispa a sophisticated per-node caching system is used.
ISSN:2100-014X
2101-6275
2100-014X
DOI:10.1051/epjconf/202429504006