Loading…
Toward a data scalable solution for facilitating discovery of science resources
•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intens...
Saved in:
Published in: | Parallel computing 2014-12, Vol.40 (10), p.682-696 |
---|---|
Main Authors: | , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark.
Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark. |
---|---|
ISSN: | 0167-8191 1872-7336 |
DOI: | 10.1016/j.parco.2014.08.002 |