Loading…

On the Complexity of Query Result Diversification

Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D , a query Q , and a positive integer k , it is to find a set of k tuples from Q ( D ) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as po...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on database systems 2014-05, Vol.39 (2), p.1-46
Main Authors: Deng, Ting, Fan, Wenfei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D , a query Q , and a positive integer k , it is to find a set of k tuples from Q ( D ) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible to each other. Subsets of Q ( D ) are ranked by an objective function defined in terms of relevance and diversity. Query result diversification has found a variety of applications in databases, information retrieval, and operations research. This article investigates the complexity of result diversification for relational queries. (1) We identify three problems in connection with query result diversification, to determine whether there exists a set of k tuples that is ranked above a bound with respect to relevance and diversity, to assess the rank of a given k -element set, and to count how many k -element sets are ranked above a given bound based on an objective function. (2) We study these problems for a variety of query languages and for the three objective functions proposed in Gollapudi and Sharma [2009]. We establish the upper and lower bounds of these problems, all matching , for both combined complexity and data complexity. (3) We also investigate several special settings of these problems, identifying tractable cases. Moreover, (4) we reinvestigate these problems in the presence of compatibility constraints commonly found in practice, and provide their complexity in all these settings.
ISSN:0362-5915
1557-4644
DOI:10.1145/2602136