Loading…
Effect of Training Data on Neural Retrieval
This thesis investigates the impact of training data configurations on the performance of neural retrieval models, specifically focusing on the BERT model. We explore two primary configurations: shallow-based training sets, characterized by numerous queries with few relevance judgments, and depth-ba...
Saved in:
Main Author: | |
---|---|
Format: | Dissertation |
Language: | English |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This thesis investigates the impact of training data configurations on the performance of neural retrieval models, specifically focusing on the BERT model.
We explore two primary configurations: shallow-based training sets, characterized by numerous queries with few relevance judgments, and depth-based training sets, featuring fewer queries with numerous relevance judgments.
Utilizing subsets we sample from the MS MARCO and LongEval datasets, we fine-tune the BERT model for sequence classification tasks and evaluate its performance using MAP, NDCG, and MRR metrics.
Our findings indicate that shallow-based training sets enhance the generalization capabilities of neural retrievers, yielding superior reranking performance and robustness across diverse topics.
Moreover, the study highlights the significance of dataset size and the inclusion of negative examples in optimizing model performance.
These insights enhance the understanding of effective training strategies in neural information retrieval. |
---|