Loading…

Toward a data scalable solution for facilitating discovery of science resources

•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intens...

Full description

Saved in:
Bibliographic Details
Published in:Parallel computing 2014-12, Vol.40 (10), p.682-696
Main Authors: Weaver, Jesse, Castellana, Vito Giovanni, Morari, Alessandro, Tumeo, Antonino, Purohit, Sumit, Chappell, Alan, Haglin, David, Villa, Oreste, Choudhury, Sutanay, Schuchardt, Karen, Feo, John
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83
container_end_page 696
container_issue 10
container_start_page 682
container_title Parallel computing
container_volume 40
creator Weaver, Jesse
Castellana, Vito Giovanni
Morari, Alessandro
Tumeo, Antonino
Purohit, Sumit
Chappell, Alan
Haglin, David
Villa, Oreste
Choudhury, Sutanay
Schuchardt, Karen
Feo, John
description •An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.
doi_str_mv 10.1016/j.parco.2014.08.002
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1651457205</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819114001100</els_id><sourcerecordid>1651457205</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83</originalsourceid><addsrcrecordid>eNp9kLtOAzEQRS0EEiHwBTQuaXYZ2_twCgoU8ZIipQm1NfGOkaPNOti7Qfl7NiQ11TTnXs09jN0LyAWI6nGT7zDakEsQRQ46B5AXbCJ0LbNaqeqSTUaqzrSYiWt2k9IGAKpCw4QtV-EHY8ORN9gjTxZbXLfEU2iH3oeOuxC5Q-tb32Pvuy_e-GTDnuKBBzfynjpLPFIKQ7SUbtmVwzbR3flO2efry2r-ni2Wbx_z50VmVan7rFlrIUuBpXQVkaprq1FqCwilLZwCPUPUYCtCqSwqIFeCVRVI4QQ1Vqspezj17mL4Hij1Zjv-RW2LHYUhGVGVoihrCeWIqhNqY0gpkjO76LcYD0aAOeozG_Onzxz1GdBm1Demnk4pGlfsPUVz3tr4SLY3TfD_5n8BDBZ6pg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1651457205</pqid></control><display><type>article</type><title>Toward a data scalable solution for facilitating discovery of science resources</title><source>ScienceDirect Journals</source><creator>Weaver, Jesse ; Castellana, Vito Giovanni ; Morari, Alessandro ; Tumeo, Antonino ; Purohit, Sumit ; Chappell, Alan ; Haglin, David ; Villa, Oreste ; Choudhury, Sutanay ; Schuchardt, Karen ; Feo, John</creator><creatorcontrib>Weaver, Jesse ; Castellana, Vito Giovanni ; Morari, Alessandro ; Tumeo, Antonino ; Purohit, Sumit ; Chappell, Alan ; Haglin, David ; Villa, Oreste ; Choudhury, Sutanay ; Schuchardt, Karen ; Feo, John</creatorcontrib><description>•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/j.parco.2014.08.002</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Benchmarking ; Clusters ; Data intensive ; Gems ; Graph database ; Graphs ; Heterogeneity ; Mathematical models ; Metadata ; Query processing ; Scalability ; Science metadata ; Semantics</subject><ispartof>Parallel computing, 2014-12, Vol.40 (10), p.682-696</ispartof><rights>2014 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Weaver, Jesse</creatorcontrib><creatorcontrib>Castellana, Vito Giovanni</creatorcontrib><creatorcontrib>Morari, Alessandro</creatorcontrib><creatorcontrib>Tumeo, Antonino</creatorcontrib><creatorcontrib>Purohit, Sumit</creatorcontrib><creatorcontrib>Chappell, Alan</creatorcontrib><creatorcontrib>Haglin, David</creatorcontrib><creatorcontrib>Villa, Oreste</creatorcontrib><creatorcontrib>Choudhury, Sutanay</creatorcontrib><creatorcontrib>Schuchardt, Karen</creatorcontrib><creatorcontrib>Feo, John</creatorcontrib><title>Toward a data scalable solution for facilitating discovery of science resources</title><title>Parallel computing</title><description>•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.</description><subject>Benchmarking</subject><subject>Clusters</subject><subject>Data intensive</subject><subject>Gems</subject><subject>Graph database</subject><subject>Graphs</subject><subject>Heterogeneity</subject><subject>Mathematical models</subject><subject>Metadata</subject><subject>Query processing</subject><subject>Scalability</subject><subject>Science metadata</subject><subject>Semantics</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kLtOAzEQRS0EEiHwBTQuaXYZ2_twCgoU8ZIipQm1NfGOkaPNOti7Qfl7NiQ11TTnXs09jN0LyAWI6nGT7zDakEsQRQ46B5AXbCJ0LbNaqeqSTUaqzrSYiWt2k9IGAKpCw4QtV-EHY8ORN9gjTxZbXLfEU2iH3oeOuxC5Q-tb32Pvuy_e-GTDnuKBBzfynjpLPFIKQ7SUbtmVwzbR3flO2efry2r-ni2Wbx_z50VmVan7rFlrIUuBpXQVkaprq1FqCwilLZwCPUPUYCtCqSwqIFeCVRVI4QQ1Vqspezj17mL4Hij1Zjv-RW2LHYUhGVGVoihrCeWIqhNqY0gpkjO76LcYD0aAOeozG_Onzxz1GdBm1Demnk4pGlfsPUVz3tr4SLY3TfD_5n8BDBZ6pg</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Weaver, Jesse</creator><creator>Castellana, Vito Giovanni</creator><creator>Morari, Alessandro</creator><creator>Tumeo, Antonino</creator><creator>Purohit, Sumit</creator><creator>Chappell, Alan</creator><creator>Haglin, David</creator><creator>Villa, Oreste</creator><creator>Choudhury, Sutanay</creator><creator>Schuchardt, Karen</creator><creator>Feo, John</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20141201</creationdate><title>Toward a data scalable solution for facilitating discovery of science resources</title><author>Weaver, Jesse ; Castellana, Vito Giovanni ; Morari, Alessandro ; Tumeo, Antonino ; Purohit, Sumit ; Chappell, Alan ; Haglin, David ; Villa, Oreste ; Choudhury, Sutanay ; Schuchardt, Karen ; Feo, John</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Benchmarking</topic><topic>Clusters</topic><topic>Data intensive</topic><topic>Gems</topic><topic>Graph database</topic><topic>Graphs</topic><topic>Heterogeneity</topic><topic>Mathematical models</topic><topic>Metadata</topic><topic>Query processing</topic><topic>Scalability</topic><topic>Science metadata</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Weaver, Jesse</creatorcontrib><creatorcontrib>Castellana, Vito Giovanni</creatorcontrib><creatorcontrib>Morari, Alessandro</creatorcontrib><creatorcontrib>Tumeo, Antonino</creatorcontrib><creatorcontrib>Purohit, Sumit</creatorcontrib><creatorcontrib>Chappell, Alan</creatorcontrib><creatorcontrib>Haglin, David</creatorcontrib><creatorcontrib>Villa, Oreste</creatorcontrib><creatorcontrib>Choudhury, Sutanay</creatorcontrib><creatorcontrib>Schuchardt, Karen</creatorcontrib><creatorcontrib>Feo, John</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Weaver, Jesse</au><au>Castellana, Vito Giovanni</au><au>Morari, Alessandro</au><au>Tumeo, Antonino</au><au>Purohit, Sumit</au><au>Chappell, Alan</au><au>Haglin, David</au><au>Villa, Oreste</au><au>Choudhury, Sutanay</au><au>Schuchardt, Karen</au><au>Feo, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Toward a data scalable solution for facilitating discovery of science resources</atitle><jtitle>Parallel computing</jtitle><date>2014-12-01</date><risdate>2014</risdate><volume>40</volume><issue>10</issue><spage>682</spage><epage>696</epage><pages>682-696</pages><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.parco.2014.08.002</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0167-8191
ispartof Parallel computing, 2014-12, Vol.40 (10), p.682-696
issn 0167-8191
1872-7336
language eng
recordid cdi_proquest_miscellaneous_1651457205
source ScienceDirect Journals
subjects Benchmarking
Clusters
Data intensive
Gems
Graph database
Graphs
Heterogeneity
Mathematical models
Metadata
Query processing
Scalability
Science metadata
Semantics
title Toward a data scalable solution for facilitating discovery of science resources
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T12%3A28%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Toward%20a%20data%20scalable%20solution%20for%20facilitating%20discovery%20of%20science%20resources&rft.jtitle=Parallel%20computing&rft.au=Weaver,%20Jesse&rft.date=2014-12-01&rft.volume=40&rft.issue=10&rft.spage=682&rft.epage=696&rft.pages=682-696&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/j.parco.2014.08.002&rft_dat=%3Cproquest_cross%3E1651457205%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1651457205&rft_id=info:pmid/&rfr_iscdi=true