Loading…

Score distributions for Pseudo Relevance Feedback

Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical language modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the Pseudo Relevanc...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences 2014-07, Vol.273, p.171-181
Main Authors: Parapar, Javier, Presedo-Quindimil, Manuel A., Barreiro, Álvaro
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical language modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the Pseudo Relevance Feedback task. It is known that one of the factors that more affect to the Pseudo Relevance Feedback robustness is the selection for some queries of harmful expansion terms. In order to minimise this effect in these methods a crucial point is to reduce the number of non-relevant documents in the pseudo relevant set. In this paper, we propose an original approach to tackle this problem. We try to automatically determine for each query how many documents we should select as pseudo-relevant set. For achieving this objective we will study the score distributions of the initial retrieval and trying to discern in base of their distribution between relevant and non-relevant documents. Evaluation of our proposal showed important improvements in terms of robustness.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2014.03.034