Loading…

Effect of Training Data on Neural Retrieval

This thesis investigates the impact of training data configurations on the performance of neural retrieval models, specifically focusing on the BERT model. We explore two primary configurations: shallow-based training sets, characterized by numerous queries with few relevance judgments, and depth-ba...

Full description

Saved in:

Bibliographic Details
Main Author:	Vo, Danny
Format:	Dissertation
Language:	English
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This thesis investigates the impact of training data configurations on the performance of neural retrieval models, specifically focusing on the BERT model. We explore two primary configurations: shallow-based training sets, characterized by numerous queries with few relevance judgments, and depth-based training sets, featuring fewer queries with numerous relevance judgments. Utilizing subsets we sample from the MS MARCO and LongEval datasets, we fine-tune the BERT model for sequence classification tasks and evaluate its performance using MAP, NDCG, and MRR metrics. Our findings indicate that shallow-based training sets enhance the generalization capabilities of neural retrievers, yielding superior reranking performance and robustness across diverse topics. Moreover, the study highlights the significance of dataset size and the inclusion of negative examples in optimizing model performance. These insights enhance the understanding of effective training strategies in neural information retrieval.