Loading…
Optimization of Large Scale HEP Data Analysis in LHCb
Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the...
Saved in:
Published in: | Journal of physics. Conference series 2011-12, Vol.331 (7), p.072060-8 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 8 |
container_issue | 7 |
container_start_page | 072060 |
container_title | Journal of physics. Conference series |
container_volume | 331 |
creator | Remenska, Daniela Aaij, Roel Raven, Gerhard Merk, Marcel Templon, Jeff Bril, Reinder J |
description | Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly. |
doi_str_mv | 10.1088/1742-6596/331/7/072060 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1825524826</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2579481152</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-9e9f81e17d12dc114a7bc19fa68cf737f48fcc10cf8ab089200d285691cf50793</originalsourceid><addsrcrecordid>eNp9kF1LwzAUhoMoOKd_QQLeeFObkzZNcjnmx4TCBPU6ZGkiGV1bk-5i_no7KioyzM0JnOc95_AgdAnkBogQKfCcJgWTRZplkPKUcEoKcoQm343jX_9TdBbjmpBseHyC2LLr_cZ_6N63DW4dLnV4s_jZ6Nrixd0TvtW9xrNG17voI_YNLhfz1Tk6cbqO9uKrTtHr_d3LfJGUy4fH-axMTCahT6SVToAFXgGtDECu-cqAdLoQxvGMu1w4Y4AYJ_SKCEkJqahghQTjGOEym6LrcW4X2vetjb3a-GhsXevGttuoQFDGaC5oMaBXf9B1uw3D3VFRxmUuABgdqGKkTGhjDNapLviNDjsFRO1tqr0otRelBpuKq9HmEEzGoG-7n8xBVnWVG3g4wP-_4xNKRYCI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2579481152</pqid></control><display><type>article</type><title>Optimization of Large Scale HEP Data Analysis in LHCb</title><source>Publicly Available Content Database</source><source>Free Full-Text Journals in Chemistry</source><creator>Remenska, Daniela ; Aaij, Roel ; Raven, Gerhard ; Merk, Marcel ; Templon, Jeff ; Bril, Reinder J</creator><creatorcontrib>Remenska, Daniela ; Aaij, Roel ; Raven, Gerhard ; Merk, Marcel ; Templon, Jeff ; Bril, Reinder J ; the LHCb Collaboration</creatorcontrib><description>Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.</description><identifier>ISSN: 1742-6596</identifier><identifier>ISSN: 1742-6588</identifier><identifier>EISSN: 1742-6596</identifier><identifier>DOI: 10.1088/1742-6596/331/7/072060</identifier><language>eng</language><publisher>Bristol: IOP Publishing</publisher><subject>Caching ; Central processing units ; Computation ; Data analysis ; Data processing ; Efficiency ; Farms ; Input output analysis ; Optimization ; Physicists ; Physics ; Remote searching ; Reproduction</subject><ispartof>Journal of physics. Conference series, 2011-12, Vol.331 (7), p.072060-8</ispartof><rights>Copyright IOP Publishing Dec 2011</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2579481152?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,37013,44590</link.rule.ids></links><search><creatorcontrib>Remenska, Daniela</creatorcontrib><creatorcontrib>Aaij, Roel</creatorcontrib><creatorcontrib>Raven, Gerhard</creatorcontrib><creatorcontrib>Merk, Marcel</creatorcontrib><creatorcontrib>Templon, Jeff</creatorcontrib><creatorcontrib>Bril, Reinder J</creatorcontrib><creatorcontrib>the LHCb Collaboration</creatorcontrib><title>Optimization of Large Scale HEP Data Analysis in LHCb</title><title>Journal of physics. Conference series</title><description>Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.</description><subject>Caching</subject><subject>Central processing units</subject><subject>Computation</subject><subject>Data analysis</subject><subject>Data processing</subject><subject>Efficiency</subject><subject>Farms</subject><subject>Input output analysis</subject><subject>Optimization</subject><subject>Physicists</subject><subject>Physics</subject><subject>Remote searching</subject><subject>Reproduction</subject><issn>1742-6596</issn><issn>1742-6588</issn><issn>1742-6596</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNp9kF1LwzAUhoMoOKd_QQLeeFObkzZNcjnmx4TCBPU6ZGkiGV1bk-5i_no7KioyzM0JnOc95_AgdAnkBogQKfCcJgWTRZplkPKUcEoKcoQm343jX_9TdBbjmpBseHyC2LLr_cZ_6N63DW4dLnV4s_jZ6Nrixd0TvtW9xrNG17voI_YNLhfz1Tk6cbqO9uKrTtHr_d3LfJGUy4fH-axMTCahT6SVToAFXgGtDECu-cqAdLoQxvGMu1w4Y4AYJ_SKCEkJqahghQTjGOEym6LrcW4X2vetjb3a-GhsXevGttuoQFDGaC5oMaBXf9B1uw3D3VFRxmUuABgdqGKkTGhjDNapLviNDjsFRO1tqr0otRelBpuKq9HmEEzGoG-7n8xBVnWVG3g4wP-_4xNKRYCI</recordid><startdate>20111223</startdate><enddate>20111223</enddate><creator>Remenska, Daniela</creator><creator>Aaij, Roel</creator><creator>Raven, Gerhard</creator><creator>Merk, Marcel</creator><creator>Templon, Jeff</creator><creator>Bril, Reinder J</creator><general>IOP Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>H8D</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7U5</scope><scope>8BQ</scope><scope>JG9</scope></search><sort><creationdate>20111223</creationdate><title>Optimization of Large Scale HEP Data Analysis in LHCb</title><author>Remenska, Daniela ; Aaij, Roel ; Raven, Gerhard ; Merk, Marcel ; Templon, Jeff ; Bril, Reinder J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-9e9f81e17d12dc114a7bc19fa68cf737f48fcc10cf8ab089200d285691cf50793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Caching</topic><topic>Central processing units</topic><topic>Computation</topic><topic>Data analysis</topic><topic>Data processing</topic><topic>Efficiency</topic><topic>Farms</topic><topic>Input output analysis</topic><topic>Optimization</topic><topic>Physicists</topic><topic>Physics</topic><topic>Remote searching</topic><topic>Reproduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Remenska, Daniela</creatorcontrib><creatorcontrib>Aaij, Roel</creatorcontrib><creatorcontrib>Raven, Gerhard</creatorcontrib><creatorcontrib>Merk, Marcel</creatorcontrib><creatorcontrib>Templon, Jeff</creatorcontrib><creatorcontrib>Bril, Reinder J</creatorcontrib><creatorcontrib>the LHCb Collaboration</creatorcontrib><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Aerospace Database</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Materials Research Database</collection><jtitle>Journal of physics. Conference series</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Remenska, Daniela</au><au>Aaij, Roel</au><au>Raven, Gerhard</au><au>Merk, Marcel</au><au>Templon, Jeff</au><au>Bril, Reinder J</au><aucorp>the LHCb Collaboration</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimization of Large Scale HEP Data Analysis in LHCb</atitle><jtitle>Journal of physics. Conference series</jtitle><date>2011-12-23</date><risdate>2011</risdate><volume>331</volume><issue>7</issue><spage>072060</spage><epage>8</epage><pages>072060-8</pages><issn>1742-6596</issn><issn>1742-6588</issn><eissn>1742-6596</eissn><abstract>Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.</abstract><cop>Bristol</cop><pub>IOP Publishing</pub><doi>10.1088/1742-6596/331/7/072060</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1742-6596 |
ispartof | Journal of physics. Conference series, 2011-12, Vol.331 (7), p.072060-8 |
issn | 1742-6596 1742-6588 1742-6596 |
language | eng |
recordid | cdi_proquest_miscellaneous_1825524826 |
source | Publicly Available Content Database; Free Full-Text Journals in Chemistry |
subjects | Caching Central processing units Computation Data analysis Data processing Efficiency Farms Input output analysis Optimization Physicists Physics Remote searching Reproduction |
title | Optimization of Large Scale HEP Data Analysis in LHCb |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T16%3A19%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimization%20of%20Large%20Scale%20HEP%20Data%20Analysis%20in%20LHCb&rft.jtitle=Journal%20of%20physics.%20Conference%20series&rft.au=Remenska,%20Daniela&rft.aucorp=the%20LHCb%20Collaboration&rft.date=2011-12-23&rft.volume=331&rft.issue=7&rft.spage=072060&rft.epage=8&rft.pages=072060-8&rft.issn=1742-6596&rft.eissn=1742-6596&rft_id=info:doi/10.1088/1742-6596/331/7/072060&rft_dat=%3Cproquest_cross%3E2579481152%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c391t-9e9f81e17d12dc114a7bc19fa68cf737f48fcc10cf8ab089200d285691cf50793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2579481152&rft_id=info:pmid/&rfr_iscdi=true |