Loading…
Fine-tuning pretrained transformer encoders for sequence-to-sequence learning
In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s...
Saved in:
Published in: | International journal of machine learning and cybernetics 2024-05, Vol.15 (5), p.1711-1728 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we introduce
s2s-ft
, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks,
s2s-ft
leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that
s2s-ft
achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the
s2s-ft
method. |
---|---|
ISSN: | 1868-8071 1868-808X |
DOI: | 10.1007/s13042-023-01992-6 |