Loading…

Optimization of Large Scale HEP Data Analysis in LHCb

Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the...

Full description

Saved in:
Bibliographic Details
Published in:Journal of physics. Conference series 2011-12, Vol.331 (7), p.072060-8
Main Authors: Remenska, Daniela, Aaij, Roel, Raven, Gerhard, Merk, Marcel, Templon, Jeff, Bril, Reinder J
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 8
container_issue 7
container_start_page 072060
container_title Journal of physics. Conference series
container_volume 331
creator Remenska, Daniela
Aaij, Roel
Raven, Gerhard
Merk, Marcel
Templon, Jeff
Bril, Reinder J
description Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.
doi_str_mv 10.1088/1742-6596/331/7/072060
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1825524826</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2579481152</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-9e9f81e17d12dc114a7bc19fa68cf737f48fcc10cf8ab089200d285691cf50793</originalsourceid><addsrcrecordid>eNp9kF1LwzAUhoMoOKd_QQLeeFObkzZNcjnmx4TCBPU6ZGkiGV1bk-5i_no7KioyzM0JnOc95_AgdAnkBogQKfCcJgWTRZplkPKUcEoKcoQm343jX_9TdBbjmpBseHyC2LLr_cZ_6N63DW4dLnV4s_jZ6Nrixd0TvtW9xrNG17voI_YNLhfz1Tk6cbqO9uKrTtHr_d3LfJGUy4fH-axMTCahT6SVToAFXgGtDECu-cqAdLoQxvGMu1w4Y4AYJ_SKCEkJqahghQTjGOEym6LrcW4X2vetjb3a-GhsXevGttuoQFDGaC5oMaBXf9B1uw3D3VFRxmUuABgdqGKkTGhjDNapLviNDjsFRO1tqr0otRelBpuKq9HmEEzGoG-7n8xBVnWVG3g4wP-_4xNKRYCI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2579481152</pqid></control><display><type>article</type><title>Optimization of Large Scale HEP Data Analysis in LHCb</title><source>Publicly Available Content Database</source><source>Free Full-Text Journals in Chemistry</source><creator>Remenska, Daniela ; Aaij, Roel ; Raven, Gerhard ; Merk, Marcel ; Templon, Jeff ; Bril, Reinder J</creator><creatorcontrib>Remenska, Daniela ; Aaij, Roel ; Raven, Gerhard ; Merk, Marcel ; Templon, Jeff ; Bril, Reinder J ; the LHCb Collaboration</creatorcontrib><description>Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.</description><identifier>ISSN: 1742-6596</identifier><identifier>ISSN: 1742-6588</identifier><identifier>EISSN: 1742-6596</identifier><identifier>DOI: 10.1088/1742-6596/331/7/072060</identifier><language>eng</language><publisher>Bristol: IOP Publishing</publisher><subject>Caching ; Central processing units ; Computation ; Data analysis ; Data processing ; Efficiency ; Farms ; Input output analysis ; Optimization ; Physicists ; Physics ; Remote searching ; Reproduction</subject><ispartof>Journal of physics. Conference series, 2011-12, Vol.331 (7), p.072060-8</ispartof><rights>Copyright IOP Publishing Dec 2011</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2579481152?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,37013,44590</link.rule.ids></links><search><creatorcontrib>Remenska, Daniela</creatorcontrib><creatorcontrib>Aaij, Roel</creatorcontrib><creatorcontrib>Raven, Gerhard</creatorcontrib><creatorcontrib>Merk, Marcel</creatorcontrib><creatorcontrib>Templon, Jeff</creatorcontrib><creatorcontrib>Bril, Reinder J</creatorcontrib><creatorcontrib>the LHCb Collaboration</creatorcontrib><title>Optimization of Large Scale HEP Data Analysis in LHCb</title><title>Journal of physics. Conference series</title><description>Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.</description><subject>Caching</subject><subject>Central processing units</subject><subject>Computation</subject><subject>Data analysis</subject><subject>Data processing</subject><subject>Efficiency</subject><subject>Farms</subject><subject>Input output analysis</subject><subject>Optimization</subject><subject>Physicists</subject><subject>Physics</subject><subject>Remote searching</subject><subject>Reproduction</subject><issn>1742-6596</issn><issn>1742-6588</issn><issn>1742-6596</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNp9kF1LwzAUhoMoOKd_QQLeeFObkzZNcjnmx4TCBPU6ZGkiGV1bk-5i_no7KioyzM0JnOc95_AgdAnkBogQKfCcJgWTRZplkPKUcEoKcoQm343jX_9TdBbjmpBseHyC2LLr_cZ_6N63DW4dLnV4s_jZ6Nrixd0TvtW9xrNG17voI_YNLhfz1Tk6cbqO9uKrTtHr_d3LfJGUy4fH-axMTCahT6SVToAFXgGtDECu-cqAdLoQxvGMu1w4Y4AYJ_SKCEkJqahghQTjGOEym6LrcW4X2vetjb3a-GhsXevGttuoQFDGaC5oMaBXf9B1uw3D3VFRxmUuABgdqGKkTGhjDNapLviNDjsFRO1tqr0otRelBpuKq9HmEEzGoG-7n8xBVnWVG3g4wP-_4xNKRYCI</recordid><startdate>20111223</startdate><enddate>20111223</enddate><creator>Remenska, Daniela</creator><creator>Aaij, Roel</creator><creator>Raven, Gerhard</creator><creator>Merk, Marcel</creator><creator>Templon, Jeff</creator><creator>Bril, Reinder J</creator><general>IOP Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>H8D</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7U5</scope><scope>8BQ</scope><scope>JG9</scope></search><sort><creationdate>20111223</creationdate><title>Optimization of Large Scale HEP Data Analysis in LHCb</title><author>Remenska, Daniela ; Aaij, Roel ; Raven, Gerhard ; Merk, Marcel ; Templon, Jeff ; Bril, Reinder J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-9e9f81e17d12dc114a7bc19fa68cf737f48fcc10cf8ab089200d285691cf50793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Caching</topic><topic>Central processing units</topic><topic>Computation</topic><topic>Data analysis</topic><topic>Data processing</topic><topic>Efficiency</topic><topic>Farms</topic><topic>Input output analysis</topic><topic>Optimization</topic><topic>Physicists</topic><topic>Physics</topic><topic>Remote searching</topic><topic>Reproduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Remenska, Daniela</creatorcontrib><creatorcontrib>Aaij, Roel</creatorcontrib><creatorcontrib>Raven, Gerhard</creatorcontrib><creatorcontrib>Merk, Marcel</creatorcontrib><creatorcontrib>Templon, Jeff</creatorcontrib><creatorcontrib>Bril, Reinder J</creatorcontrib><creatorcontrib>the LHCb Collaboration</creatorcontrib><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Aerospace Database</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Materials Research Database</collection><jtitle>Journal of physics. Conference series</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Remenska, Daniela</au><au>Aaij, Roel</au><au>Raven, Gerhard</au><au>Merk, Marcel</au><au>Templon, Jeff</au><au>Bril, Reinder J</au><aucorp>the LHCb Collaboration</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimization of Large Scale HEP Data Analysis in LHCb</atitle><jtitle>Journal of physics. Conference series</jtitle><date>2011-12-23</date><risdate>2011</risdate><volume>331</volume><issue>7</issue><spage>072060</spage><epage>8</epage><pages>072060-8</pages><issn>1742-6596</issn><issn>1742-6588</issn><eissn>1742-6596</eissn><abstract>Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.</abstract><cop>Bristol</cop><pub>IOP Publishing</pub><doi>10.1088/1742-6596/331/7/072060</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1742-6596
ispartof Journal of physics. Conference series, 2011-12, Vol.331 (7), p.072060-8
issn 1742-6596
1742-6588
1742-6596
language eng
recordid cdi_proquest_miscellaneous_1825524826
source Publicly Available Content Database; Free Full-Text Journals in Chemistry
subjects Caching
Central processing units
Computation
Data analysis
Data processing
Efficiency
Farms
Input output analysis
Optimization
Physicists
Physics
Remote searching
Reproduction
title Optimization of Large Scale HEP Data Analysis in LHCb
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T16%3A19%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimization%20of%20Large%20Scale%20HEP%20Data%20Analysis%20in%20LHCb&rft.jtitle=Journal%20of%20physics.%20Conference%20series&rft.au=Remenska,%20Daniela&rft.aucorp=the%20LHCb%20Collaboration&rft.date=2011-12-23&rft.volume=331&rft.issue=7&rft.spage=072060&rft.epage=8&rft.pages=072060-8&rft.issn=1742-6596&rft.eissn=1742-6596&rft_id=info:doi/10.1088/1742-6596/331/7/072060&rft_dat=%3Cproquest_cross%3E2579481152%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c391t-9e9f81e17d12dc114a7bc19fa68cf737f48fcc10cf8ab089200d285691cf50793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2579481152&rft_id=info:pmid/&rfr_iscdi=true