Loading…

Improved sentence retrieval using local context and sentence length

•We extend the TF–ISF method to use local context.•We extend the TF–ISF method to promote retrieval of long sentences.•Context and promoting retrieval of long sentences both improves sentence retrieval.•We also combine using context and promoting retrieval of long sentences.•It is useful to use at t...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management 2013-11, Vol.49 (6), p.1301-1312
Main Authors: Doko, Alen, Štula, Maja, Šerić, Ljiljana
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We extend the TF–ISF method to use local context.•We extend the TF–ISF method to promote retrieval of long sentences.•Context and promoting retrieval of long sentences both improves sentence retrieval.•We also combine using context and promoting retrieval of long sentences.•It is useful to use at the same time context and promoting retrieval of long sentences. In this paper we propose improved variants of the sentence retrieval method TF–ISF (a TF–IDF or Term Frequency–Inverse Document Frequency variant for sentence retrieval). The improvement is achieved by using context consisting of neighboring sentences and at the same time promoting the retrieval of longer sentences. We thoroughly compare new modified TF–ISF methods to the TF–ISF baseline, to an earlier attempt to include context into TF–ISF named tfmix and to a language modeling based method that uses context and promoting retrieval of long sentences named 3MMPDS. Experimental results show that the TF–ISF method can be improved using local context. Results also show that the TF–ISF method can be improved by promoting the retrieval of longer sentences. Finally we show that the best results are achieved when combining both modifications. All new methods (TF–ISF variants) also show statistically significant better results than the other tested methods.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2013.06.004