Loading…

Identifying the most influential data objects with reverse top-k queries

Top- k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top- k queries lead...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the VLDB Endowment 2010-09, Vol.3 (1-2), p.364-372
Main Authors: Vlachou, Akrivi, Doulkeridis, Christos, Nørvåg, Kjetil, Kotidis, Yannis
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Top- k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top- k queries leads to a query type that instead returns the set of customers that find a product appealing (it belongs to the top- k result set of their preferences). In this paper, we address the challenging problem of processing queries that identify the top- m most influential products to customers, where influence is defined as the cardinality of the reverse top- k result set. This definition of influence is useful for market analysis, since it is directly related to the number of customers that value a particular product and, consequently, to its visibility and impact in the market. Existing techniques require processing a reverse top- k query for each object in the database, which is prohibitively expensive even for databases of moderate size. In contrast, we propose two algorithms, SB and BB , for identifying the most influential objects: SB restricts the candidate set of objects that need to be examined, while BB is a branch-and-bound algorithm that retrieves the result incrementally. Furthermore, we propose meaningful variations of the query for most influential objects that are supported by our algorithms. Our experiments demonstrate the efficiency of our algorithms both for synthetic and real-life datasets.
ISSN:2150-8097
2150-8097
DOI:10.14778/1920841.1920890