Loading…

Managing very large distributed data sets on a data grid

In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management m...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2010-08, Vol.22 (11), p.1338-1364
Main Authors: Branco, Miguel, Zaluska, Ed, de Roure, David, Lassnig, Mario, Garonne, Vincent
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c4049-849c0819762ee5aa280d1f1560163cd345c837ed6d7854824b4d34083a281e693
cites cdi_FETCH-LOGICAL-c4049-849c0819762ee5aa280d1f1560163cd345c837ed6d7854824b4d34083a281e693
container_end_page 1364
container_issue 11
container_start_page 1338
container_title Concurrency and computation
container_volume 22
creator Branco, Miguel
Zaluska, Ed
de Roure, David
Lassnig, Mario
Garonne, Vincent
description In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management middleware. This middleware, called DQ2, has been used for the last several years by the ATLAS experiment for shipping petabytes of data to research centres and universities worldwide. We describe our experience in developing and deploying DQ2 on the Worldwide LHC computing Grid, a production Grid infrastructure formed of hundreds of computing sites. From this operational experience, we have identified an important degree of uncertainty that underlies the behaviour of large Grid infrastructures. This uncertainty is subjected to a detailed analysis, leading us to present novel modelling and simulation techniques for Data Grids. In addition, we discuss what we perceive as practical limits to the development of data distribution algorithms for Data Grids given the underlying infrastructure uncertainty, and propose future research directions. Copyright © 2009 John Wiley & Sons, Ltd.
doi_str_mv 10.1002/cpe.1489
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_896227429</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>896227429</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4049-849c0819762ee5aa280d1f1560163cd345c837ed6d7854824b4d34083a281e693</originalsourceid><addsrcrecordid>eNp90M9PwjAUwPHGaCKiiX_CbnoZvv5Y2x2FIJqAGqNybMr6WKZjw3ao_PeOYDAe9NTXl0_e4UvIKYUeBWAX2RJ7VOh0j3RowlkMkov93czkITkK4QWAUuC0Q_TEVjYvqjx6R7-OSutzjFwRGl_MVg26yNnGRgGbENVVZLff3BfumBzMbRnw5Pvtkqer4ePgOh7fjW4Gl-M4EyDSWIs0A01TJRliYi3T4OicJhKo5JnjIsk0V-ikUzoRmomZaJegeSspypR3ydn27tLXbysMjVkUIcOytBXWq2B0KhlTgm3k-b-SKg6gGAP-QzNfh-Bxbpa-WFi_NhTMJqNpM5pNxpbGW_pRlLj-05nB_fC3bxvi585b_2qk4iox09uRUf3JA38e982UfwHkwn9K</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1730072203</pqid></control><display><type>article</type><title>Managing very large distributed data sets on a data grid</title><source>Wiley</source><creator>Branco, Miguel ; Zaluska, Ed ; de Roure, David ; Lassnig, Mario ; Garonne, Vincent</creator><creatorcontrib>Branco, Miguel ; Zaluska, Ed ; de Roure, David ; Lassnig, Mario ; Garonne, Vincent</creatorcontrib><description>In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management middleware. This middleware, called DQ2, has been used for the last several years by the ATLAS experiment for shipping petabytes of data to research centres and universities worldwide. We describe our experience in developing and deploying DQ2 on the Worldwide LHC computing Grid, a production Grid infrastructure formed of hundreds of computing sites. From this operational experience, we have identified an important degree of uncertainty that underlies the behaviour of large Grid infrastructures. This uncertainty is subjected to a detailed analysis, leading us to present novel modelling and simulation techniques for Data Grids. In addition, we discuss what we perceive as practical limits to the development of data distribution algorithms for Data Grids given the underlying infrastructure uncertainty, and propose future research directions. Copyright © 2009 John Wiley &amp; Sons, Ltd.</description><identifier>ISSN: 1532-0626</identifier><identifier>ISSN: 1532-0634</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.1489</identifier><language>eng</language><publisher>Chichester, UK: John Wiley &amp; Sons, Ltd</publisher><subject>Computation ; Concurrency ; data management ; distributed systems ; grid computing ; Hadrons ; Infrastructure ; Middleware ; modelling ; simulation ; Uncertainty</subject><ispartof>Concurrency and computation, 2010-08, Vol.22 (11), p.1338-1364</ispartof><rights>Copyright © 2009 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4049-849c0819762ee5aa280d1f1560163cd345c837ed6d7854824b4d34083a281e693</citedby><cites>FETCH-LOGICAL-c4049-849c0819762ee5aa280d1f1560163cd345c837ed6d7854824b4d34083a281e693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Branco, Miguel</creatorcontrib><creatorcontrib>Zaluska, Ed</creatorcontrib><creatorcontrib>de Roure, David</creatorcontrib><creatorcontrib>Lassnig, Mario</creatorcontrib><creatorcontrib>Garonne, Vincent</creatorcontrib><title>Managing very large distributed data sets on a data grid</title><title>Concurrency and computation</title><addtitle>Concurrency Computat.: Pract. Exper</addtitle><description>In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management middleware. This middleware, called DQ2, has been used for the last several years by the ATLAS experiment for shipping petabytes of data to research centres and universities worldwide. We describe our experience in developing and deploying DQ2 on the Worldwide LHC computing Grid, a production Grid infrastructure formed of hundreds of computing sites. From this operational experience, we have identified an important degree of uncertainty that underlies the behaviour of large Grid infrastructures. This uncertainty is subjected to a detailed analysis, leading us to present novel modelling and simulation techniques for Data Grids. In addition, we discuss what we perceive as practical limits to the development of data distribution algorithms for Data Grids given the underlying infrastructure uncertainty, and propose future research directions. Copyright © 2009 John Wiley &amp; Sons, Ltd.</description><subject>Computation</subject><subject>Concurrency</subject><subject>data management</subject><subject>distributed systems</subject><subject>grid computing</subject><subject>Hadrons</subject><subject>Infrastructure</subject><subject>Middleware</subject><subject>modelling</subject><subject>simulation</subject><subject>Uncertainty</subject><issn>1532-0626</issn><issn>1532-0634</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNp90M9PwjAUwPHGaCKiiX_CbnoZvv5Y2x2FIJqAGqNybMr6WKZjw3ao_PeOYDAe9NTXl0_e4UvIKYUeBWAX2RJ7VOh0j3RowlkMkov93czkITkK4QWAUuC0Q_TEVjYvqjx6R7-OSutzjFwRGl_MVg26yNnGRgGbENVVZLff3BfumBzMbRnw5Pvtkqer4ePgOh7fjW4Gl-M4EyDSWIs0A01TJRliYi3T4OicJhKo5JnjIsk0V-ikUzoRmomZaJegeSspypR3ydn27tLXbysMjVkUIcOytBXWq2B0KhlTgm3k-b-SKg6gGAP-QzNfh-Bxbpa-WFi_NhTMJqNpM5pNxpbGW_pRlLj-05nB_fC3bxvi585b_2qk4iox09uRUf3JA38e982UfwHkwn9K</recordid><startdate>20100810</startdate><enddate>20100810</enddate><creator>Branco, Miguel</creator><creator>Zaluska, Ed</creator><creator>de Roure, David</creator><creator>Lassnig, Mario</creator><creator>Garonne, Vincent</creator><general>John Wiley &amp; Sons, Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20100810</creationdate><title>Managing very large distributed data sets on a data grid</title><author>Branco, Miguel ; Zaluska, Ed ; de Roure, David ; Lassnig, Mario ; Garonne, Vincent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4049-849c0819762ee5aa280d1f1560163cd345c837ed6d7854824b4d34083a281e693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Computation</topic><topic>Concurrency</topic><topic>data management</topic><topic>distributed systems</topic><topic>grid computing</topic><topic>Hadrons</topic><topic>Infrastructure</topic><topic>Middleware</topic><topic>modelling</topic><topic>simulation</topic><topic>Uncertainty</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Branco, Miguel</creatorcontrib><creatorcontrib>Zaluska, Ed</creatorcontrib><creatorcontrib>de Roure, David</creatorcontrib><creatorcontrib>Lassnig, Mario</creatorcontrib><creatorcontrib>Garonne, Vincent</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Branco, Miguel</au><au>Zaluska, Ed</au><au>de Roure, David</au><au>Lassnig, Mario</au><au>Garonne, Vincent</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Managing very large distributed data sets on a data grid</atitle><jtitle>Concurrency and computation</jtitle><addtitle>Concurrency Computat.: Pract. Exper</addtitle><date>2010-08-10</date><risdate>2010</risdate><volume>22</volume><issue>11</issue><spage>1338</spage><epage>1364</epage><pages>1338-1364</pages><issn>1532-0626</issn><issn>1532-0634</issn><eissn>1532-0634</eissn><abstract>In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management middleware. This middleware, called DQ2, has been used for the last several years by the ATLAS experiment for shipping petabytes of data to research centres and universities worldwide. We describe our experience in developing and deploying DQ2 on the Worldwide LHC computing Grid, a production Grid infrastructure formed of hundreds of computing sites. From this operational experience, we have identified an important degree of uncertainty that underlies the behaviour of large Grid infrastructures. This uncertainty is subjected to a detailed analysis, leading us to present novel modelling and simulation techniques for Data Grids. In addition, we discuss what we perceive as practical limits to the development of data distribution algorithms for Data Grids given the underlying infrastructure uncertainty, and propose future research directions. Copyright © 2009 John Wiley &amp; Sons, Ltd.</abstract><cop>Chichester, UK</cop><pub>John Wiley &amp; Sons, Ltd</pub><doi>10.1002/cpe.1489</doi><tpages>27</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1532-0626
ispartof Concurrency and computation, 2010-08, Vol.22 (11), p.1338-1364
issn 1532-0626
1532-0634
1532-0634
language eng
recordid cdi_proquest_miscellaneous_896227429
source Wiley
subjects Computation
Concurrency
data management
distributed systems
grid computing
Hadrons
Infrastructure
Middleware
modelling
simulation
Uncertainty
title Managing very large distributed data sets on a data grid
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T18%3A25%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Managing%20very%20large%20distributed%20data%20sets%20on%20a%20data%20grid&rft.jtitle=Concurrency%20and%20computation&rft.au=Branco,%20Miguel&rft.date=2010-08-10&rft.volume=22&rft.issue=11&rft.spage=1338&rft.epage=1364&rft.pages=1338-1364&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.1489&rft_dat=%3Cproquest_cross%3E896227429%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4049-849c0819762ee5aa280d1f1560163cd345c837ed6d7854824b4d34083a281e693%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1730072203&rft_id=info:pmid/&rfr_iscdi=true