Loading…

Reducing hardware hit by queries in web search engines

•We propose a collection selection method outperforming the state of the art.•We conduct a learning process over a term space with low dimensionality.•The learning process uses IR measures as reward functions.•We introduce a new novelty detection method for incremental learning. In this paper, we in...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management 2016-11, Vol.52 (6), p.1031-1052
Main Authors: Mendoza, Marcelo, Marín, Mauricio, Gil-Costa, Verónica, Ferrarotti, Flavio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We propose a collection selection method outperforming the state of the art.•We conduct a learning process over a term space with low dimensionality.•The learning process uses IR measures as reward functions.•We introduce a new novelty detection method for incremental learning. In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2016.04.008