Loading…

Fine-tuning pretrained transformer encoders for sequence-to-sequence learning

In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of machine learning and cybernetics 2024-05, Vol.15 (5), p.1711-1728
Main Authors:	Bao, Hangbo, Dong, Li, Wang, Wenhui, Yang, Nan, Piao, Songhao, Wei, Furu
Format:	Article
Language:	English
Subjects:	Algorithms Artificial Intelligence Coders Complex Systems Computational Intelligence Control Engineering Language Mechatronics Methods Multilingualism Natural language processing Original Article Pattern Recognition Robotics Semantics Systems Biology
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.
ISSN:	1868-8071 1868-808X
DOI:	10.1007/s13042-023-01992-6