Loading…

Efficient querying of multidimensional RDF data with aggregates: Comparing NoSQL, RDF and relational data stores

•Pre-aggregates improve the performance of all tested data stores.•Relational databases outperform all other data stores.•Neo4j NoSQL is more recommended than Cassandra NoSQL for manipulating multidimensional data, as it scales better for larger datasets.•Neo4j performs better than Jena TDB2 and Vir...

Full description

Saved in:
Bibliographic Details
Published in:International journal of information management 2020-10, Vol.54, p.102089, Article 102089
Main Authors: Ravat, Franck, Song, Jiefu, Teste, Olivier, Trojahn, Cassia
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Pre-aggregates improve the performance of all tested data stores.•Relational databases outperform all other data stores.•Neo4j NoSQL is more recommended than Cassandra NoSQL for manipulating multidimensional data, as it scales better for larger datasets.•Neo4j performs better than Jena TDB2 and Virtuoso triple stores. This paper proposes an approach to tackle the problem of querying large volume of statistical RDF data. Our approach relies on pre-aggregation strategies to better manage the analysis of this kind of data. Specifically, we define a conceptual model to represent original RDF data with aggregates in a multidimensional structure. A set of translations rules for converting a well-known multidimensional RDF modelling vocabulary into the proposed conceptual model is then proposed. We implement the conceptual model in six different data stores: two RDF triple stores (Jena TDB and Virtuoso), one graph-oriented NoSQL database (Neo4j), one column-oriented data store (Cassandra), and two relational databases (MySQL and PostGreSQL). We compare the querying performance, with and without aggregates, in these data stores. Experimental results, on real-world datasets containing 81.92 million triplets, show that pre-aggregation allows for reducing query runtime in all data stores. Neo4j NoSQL and relational databases with aggregates outperform triple stores speeding up to 99% query runtime.
ISSN:0268-4012
1873-4707
DOI:10.1016/j.ijinfomgt.2020.102089