Loading…
Identifying the most influential data objects with reverse top-k queries
Top- k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top- k queries lead...
Saved in:
Published in: | Proceedings of the VLDB Endowment 2010-09, Vol.3 (1-2), p.364-372 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Top-
k
queries are widely applied for retrieving a ranked set of the
k
most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-
k
queries leads to a query type that instead returns the set of customers that find a product appealing (it belongs to the top-
k
result set of their preferences). In this paper, we address the challenging problem of processing queries that identify the top-
m
most
influential products
to customers, where influence is defined as the cardinality of the reverse top-
k
result set. This definition of influence is useful for market analysis, since it is directly related to the number of customers that value a particular product and, consequently, to its visibility and impact in the market. Existing techniques require processing a reverse top-
k
query for each object in the database, which is prohibitively expensive even for databases of moderate size. In contrast, we propose two algorithms,
SB
and
BB
, for identifying the most influential objects:
SB
restricts the candidate set of objects that need to be examined, while
BB
is a branch-and-bound algorithm that retrieves the result incrementally. Furthermore, we propose meaningful variations of the query for most influential objects that are supported by our algorithms. Our experiments demonstrate the efficiency of our algorithms both for synthetic and real-life datasets. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/1920841.1920890 |