Loading…

Transfer fine-tuning of BERT with phrasal paraphrases

•Transfer fine-tuning yields representations suitable for specific tasks; in this paper we focused on sentence pair modelling.•The method helps the BERT model converge more quickly with a smaller corpus.•It also realizes performance gains while maintaining the model size.•Simple features outperform...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language 2021-03, Vol.66, p.101164, Article 101164
Main Authors: Arase, Yuki, Tsujii, Junichi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Transfer fine-tuning yields representations suitable for specific tasks; in this paper we focused on sentence pair modelling.•The method helps the BERT model converge more quickly with a smaller corpus.•It also realizes performance gains while maintaining the model size.•Simple features outperform elaborate ones in phrasal paraphrase classification. Sentence pair modelling is defined as the task of identifying the semantic interaction between a sentence pair, i.e., paraphrase and textual entailment identification and semantic similarity measurement. It constitutes a set of crucial tasks for research in the area of natural language understanding. Sentence representation learning is a fundamental technology for sentence pair modelling, where the development of the BERT model realised a breakthrough. We have recently proposed transfer fine-tuning using phrasal paraphrases to allow BERT’s representations to be suitable for semantic equivalence assessment between sentences while maintaining the model size. Herein, we reveal that transfer fine-tuning with simplified feature generation allows us to generate representations that are widely effective across different types of sentence pair modelling tasks. Detailed analysis confirms that our transfer fine-tuning helps the BERT model converge more quickly with a smaller corpus for fine-tuning.
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2020.101164