Loading…
Toward a data scalable solution for facilitating discovery of science resources
•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intens...
Saved in:
Published in: | Parallel computing 2014-12, Vol.40 (10), p.682-696 |
---|---|
Main Authors: | , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83 |
container_end_page | 696 |
container_issue | 10 |
container_start_page | 682 |
container_title | Parallel computing |
container_volume | 40 |
creator | Weaver, Jesse Castellana, Vito Giovanni Morari, Alessandro Tumeo, Antonino Purohit, Sumit Chappell, Alan Haglin, David Villa, Oreste Choudhury, Sutanay Schuchardt, Karen Feo, John |
description | •An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark.
Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark. |
doi_str_mv | 10.1016/j.parco.2014.08.002 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1651457205</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819114001100</els_id><sourcerecordid>1651457205</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83</originalsourceid><addsrcrecordid>eNp9kLtOAzEQRS0EEiHwBTQuaXYZ2_twCgoU8ZIipQm1NfGOkaPNOti7Qfl7NiQ11TTnXs09jN0LyAWI6nGT7zDakEsQRQ46B5AXbCJ0LbNaqeqSTUaqzrSYiWt2k9IGAKpCw4QtV-EHY8ORN9gjTxZbXLfEU2iH3oeOuxC5Q-tb32Pvuy_e-GTDnuKBBzfynjpLPFIKQ7SUbtmVwzbR3flO2efry2r-ni2Wbx_z50VmVan7rFlrIUuBpXQVkaprq1FqCwilLZwCPUPUYCtCqSwqIFeCVRVI4QQ1Vqspezj17mL4Hij1Zjv-RW2LHYUhGVGVoihrCeWIqhNqY0gpkjO76LcYD0aAOeozG_Onzxz1GdBm1Demnk4pGlfsPUVz3tr4SLY3TfD_5n8BDBZ6pg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1651457205</pqid></control><display><type>article</type><title>Toward a data scalable solution for facilitating discovery of science resources</title><source>ScienceDirect Journals</source><creator>Weaver, Jesse ; Castellana, Vito Giovanni ; Morari, Alessandro ; Tumeo, Antonino ; Purohit, Sumit ; Chappell, Alan ; Haglin, David ; Villa, Oreste ; Choudhury, Sutanay ; Schuchardt, Karen ; Feo, John</creator><creatorcontrib>Weaver, Jesse ; Castellana, Vito Giovanni ; Morari, Alessandro ; Tumeo, Antonino ; Purohit, Sumit ; Chappell, Alan ; Haglin, David ; Villa, Oreste ; Choudhury, Sutanay ; Schuchardt, Karen ; Feo, John</creatorcontrib><description>•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark.
Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/j.parco.2014.08.002</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Benchmarking ; Clusters ; Data intensive ; Gems ; Graph database ; Graphs ; Heterogeneity ; Mathematical models ; Metadata ; Query processing ; Scalability ; Science metadata ; Semantics</subject><ispartof>Parallel computing, 2014-12, Vol.40 (10), p.682-696</ispartof><rights>2014 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Weaver, Jesse</creatorcontrib><creatorcontrib>Castellana, Vito Giovanni</creatorcontrib><creatorcontrib>Morari, Alessandro</creatorcontrib><creatorcontrib>Tumeo, Antonino</creatorcontrib><creatorcontrib>Purohit, Sumit</creatorcontrib><creatorcontrib>Chappell, Alan</creatorcontrib><creatorcontrib>Haglin, David</creatorcontrib><creatorcontrib>Villa, Oreste</creatorcontrib><creatorcontrib>Choudhury, Sutanay</creatorcontrib><creatorcontrib>Schuchardt, Karen</creatorcontrib><creatorcontrib>Feo, John</creatorcontrib><title>Toward a data scalable solution for facilitating discovery of science resources</title><title>Parallel computing</title><description>•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark.
Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.</description><subject>Benchmarking</subject><subject>Clusters</subject><subject>Data intensive</subject><subject>Gems</subject><subject>Graph database</subject><subject>Graphs</subject><subject>Heterogeneity</subject><subject>Mathematical models</subject><subject>Metadata</subject><subject>Query processing</subject><subject>Scalability</subject><subject>Science metadata</subject><subject>Semantics</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kLtOAzEQRS0EEiHwBTQuaXYZ2_twCgoU8ZIipQm1NfGOkaPNOti7Qfl7NiQ11TTnXs09jN0LyAWI6nGT7zDakEsQRQ46B5AXbCJ0LbNaqeqSTUaqzrSYiWt2k9IGAKpCw4QtV-EHY8ORN9gjTxZbXLfEU2iH3oeOuxC5Q-tb32Pvuy_e-GTDnuKBBzfynjpLPFIKQ7SUbtmVwzbR3flO2efry2r-ni2Wbx_z50VmVan7rFlrIUuBpXQVkaprq1FqCwilLZwCPUPUYCtCqSwqIFeCVRVI4QQ1Vqspezj17mL4Hij1Zjv-RW2LHYUhGVGVoihrCeWIqhNqY0gpkjO76LcYD0aAOeozG_Onzxz1GdBm1Demnk4pGlfsPUVz3tr4SLY3TfD_5n8BDBZ6pg</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Weaver, Jesse</creator><creator>Castellana, Vito Giovanni</creator><creator>Morari, Alessandro</creator><creator>Tumeo, Antonino</creator><creator>Purohit, Sumit</creator><creator>Chappell, Alan</creator><creator>Haglin, David</creator><creator>Villa, Oreste</creator><creator>Choudhury, Sutanay</creator><creator>Schuchardt, Karen</creator><creator>Feo, John</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20141201</creationdate><title>Toward a data scalable solution for facilitating discovery of science resources</title><author>Weaver, Jesse ; Castellana, Vito Giovanni ; Morari, Alessandro ; Tumeo, Antonino ; Purohit, Sumit ; Chappell, Alan ; Haglin, David ; Villa, Oreste ; Choudhury, Sutanay ; Schuchardt, Karen ; Feo, John</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Benchmarking</topic><topic>Clusters</topic><topic>Data intensive</topic><topic>Gems</topic><topic>Graph database</topic><topic>Graphs</topic><topic>Heterogeneity</topic><topic>Mathematical models</topic><topic>Metadata</topic><topic>Query processing</topic><topic>Scalability</topic><topic>Science metadata</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Weaver, Jesse</creatorcontrib><creatorcontrib>Castellana, Vito Giovanni</creatorcontrib><creatorcontrib>Morari, Alessandro</creatorcontrib><creatorcontrib>Tumeo, Antonino</creatorcontrib><creatorcontrib>Purohit, Sumit</creatorcontrib><creatorcontrib>Chappell, Alan</creatorcontrib><creatorcontrib>Haglin, David</creatorcontrib><creatorcontrib>Villa, Oreste</creatorcontrib><creatorcontrib>Choudhury, Sutanay</creatorcontrib><creatorcontrib>Schuchardt, Karen</creatorcontrib><creatorcontrib>Feo, John</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Weaver, Jesse</au><au>Castellana, Vito Giovanni</au><au>Morari, Alessandro</au><au>Tumeo, Antonino</au><au>Purohit, Sumit</au><au>Chappell, Alan</au><au>Haglin, David</au><au>Villa, Oreste</au><au>Choudhury, Sutanay</au><au>Schuchardt, Karen</au><au>Feo, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Toward a data scalable solution for facilitating discovery of science resources</atitle><jtitle>Parallel computing</jtitle><date>2014-12-01</date><risdate>2014</risdate><volume>40</volume><issue>10</issue><spage>682</spage><epage>696</epage><pages>682-696</pages><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark.
Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.parco.2014.08.002</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-8191 |
ispartof | Parallel computing, 2014-12, Vol.40 (10), p.682-696 |
issn | 0167-8191 1872-7336 |
language | eng |
recordid | cdi_proquest_miscellaneous_1651457205 |
source | ScienceDirect Journals |
subjects | Benchmarking Clusters Data intensive Gems Graph database Graphs Heterogeneity Mathematical models Metadata Query processing Scalability Science metadata Semantics |
title | Toward a data scalable solution for facilitating discovery of science resources |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T12%3A28%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Toward%20a%20data%20scalable%20solution%20for%20facilitating%20discovery%20of%20science%20resources&rft.jtitle=Parallel%20computing&rft.au=Weaver,%20Jesse&rft.date=2014-12-01&rft.volume=40&rft.issue=10&rft.spage=682&rft.epage=696&rft.pages=682-696&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/j.parco.2014.08.002&rft_dat=%3Cproquest_cross%3E1651457205%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-db81251a52f6ee377c8a28c0a05c4f3089aa80c6ea23ca30ef50c36021f1edc83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1651457205&rft_id=info:pmid/&rfr_iscdi=true |