Loading…
Top-k query optimization over data services
The efficient evaluation of top-k queries is crucial for many applications where a huge quantity of data should be ranked and sorted to return the best answers to users in a reasonable time. Examples include, e-commerce platforms (e.g., amazon.com), multimedia sharing platforms, web databases, etc....
Saved in:
Published in: | Future generation computer systems 2020-12, Vol.113, p.1-12 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The efficient evaluation of top-k queries is crucial for many applications where a huge quantity of data should be ranked and sorted to return the best answers to users in a reasonable time. Examples include, e-commerce platforms (e.g., amazon.com), multimedia sharing platforms, web databases, etc. Most often, these applications need to retrieve data from autonomous data sources. The access to these data sources is carried out through popular Web APIs, such as data web services, to provide a standard way to interact with data. In this context, users’ queries often require the composition of multiple data services to be answered. Most of existing solutions for the evaluation of top-k queries assume data services to provide both sorted and random accesses to data or only a sorted access. In practice, however, some services may provide only a random access to data, which could impact the performance of the solutions. In this paper, we propose an approach to optimize the evaluation of top-k queries over data services. We consider the worst case scenario when services provide only a random access to data. Our approach defines two strategies: Pipeline Parallel Strategy and Necessary Invocation Principle to reduce the composition processing time and the number of unnecessary service invocations. Conducted experiments showcased the scalability and efficiency of our solution.
•The top-k queries are answered by composing external data sources or data services.•Unlike relational databases that are originally designed to have full access to data sources, data services are exposed with a limited number of access interfaces, in which the values of some attributes must be mandatorily specified.•Our approach “Efficient Evaluation of Top-k Queries Over Data Services”, provides an optimal algorithm for computing top-k results independently for any service composition plan. It is mainly based on two strategies: pipeline-parallel strategy and necessary invocation principle.•We use the pipelined parallelism query execution model as a first step to speed up the top-k query execution.•The Necessary Invocation Principle minimizes the number of service invocations during the execution of a composition plan. This strategy determines if an invocation is truly necessary to provide the final top-k result.•The results of the evaluation prove the scalability and efficiency of our approach. |
---|---|
ISSN: | 0167-739X 1872-7115 |
DOI: | 10.1016/j.future.2020.06.052 |