Loading…

Fine-tuning pretrained transformer encoders for sequence-to-sequence learning

In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s...

Full description

Saved in:
Bibliographic Details
Published in:International journal of machine learning and cybernetics 2024-05, Vol.15 (5), p.1711-1728
Main Authors: Bao, Hangbo, Dong, Li, Wang, Wenhui, Yang, Nan, Piao, Songhao, Wei, Furu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3
cites cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3
container_end_page 1728
container_issue 5
container_start_page 1711
container_title International journal of machine learning and cybernetics
container_volume 15
creator Bao, Hangbo
Dong, Li
Wang, Wenhui
Yang, Nan
Piao, Songhao
Wei, Furu
description In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.
doi_str_mv 10.1007/s13042-023-01992-6
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3037835019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3037835019</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</originalsourceid><addsrcrecordid>eNqFkE9LAzEQxYMoWLRfwNOC5-gk002yRylWhYoXBW8hm05KS5utyfbgtzd1_XPTucxkeO9l-DF2IeBKAOjrLBAmkoNEDqJpJFdHbCSMMtyAeT3-mbU4ZeOc11BKASLIEXucrSLxfh9XcVntEvXJlcWiKj3m0KUtpYqi7xaUclXeVaa3fVkUT8e_52pDLh0SztlJcJtM469-xl5mt8_Tez5_unuY3sy5x4nueeMbQk_QErg2aJITJU0gp8D4BTUiCHR1643UQTgKonZNQwrbINpaBuXxjF0OubvUlRNyb9fdPsXypUVAbbAuIP5TaVB1jUUlB5VPXc6Jgt2l1daldyvAHvjaga8tfO0nX6uKCQdTLuK4pPQb_YfrA7HPfmU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3037706553</pqid></control><display><type>article</type><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><source>Springer Link</source><creator>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</creator><creatorcontrib>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</creatorcontrib><description>In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-023-01992-6</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Artificial Intelligence ; Coders ; Complex Systems ; Computational Intelligence ; Control ; Engineering ; Language ; Mechatronics ; Methods ; Multilingualism ; Natural language processing ; Original Article ; Pattern Recognition ; Robotics ; Semantics ; Systems Biology</subject><ispartof>International journal of machine learning and cybernetics, 2024-05, Vol.15 (5), p.1711-1728</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</citedby><cites>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</cites><orcidid>0000-0003-4351-9112</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Bao, Hangbo</creatorcontrib><creatorcontrib>Dong, Li</creatorcontrib><creatorcontrib>Wang, Wenhui</creatorcontrib><creatorcontrib>Yang, Nan</creatorcontrib><creatorcontrib>Piao, Songhao</creatorcontrib><creatorcontrib>Wei, Furu</creatorcontrib><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. &amp; Cyber</addtitle><description>In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Coders</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Engineering</subject><subject>Language</subject><subject>Mechatronics</subject><subject>Methods</subject><subject>Multilingualism</subject><subject>Natural language processing</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Robotics</subject><subject>Semantics</subject><subject>Systems Biology</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkE9LAzEQxYMoWLRfwNOC5-gk002yRylWhYoXBW8hm05KS5utyfbgtzd1_XPTucxkeO9l-DF2IeBKAOjrLBAmkoNEDqJpJFdHbCSMMtyAeT3-mbU4ZeOc11BKASLIEXucrSLxfh9XcVntEvXJlcWiKj3m0KUtpYqi7xaUclXeVaa3fVkUT8e_52pDLh0SztlJcJtM469-xl5mt8_Tez5_unuY3sy5x4nueeMbQk_QErg2aJITJU0gp8D4BTUiCHR1643UQTgKonZNQwrbINpaBuXxjF0OubvUlRNyb9fdPsXypUVAbbAuIP5TaVB1jUUlB5VPXc6Jgt2l1daldyvAHvjaga8tfO0nX6uKCQdTLuK4pPQb_YfrA7HPfmU</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Bao, Hangbo</creator><creator>Dong, Li</creator><creator>Wang, Wenhui</creator><creator>Yang, Nan</creator><creator>Piao, Songhao</creator><creator>Wei, Furu</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0003-4351-9112</orcidid></search><sort><creationdate>20240501</creationdate><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><author>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Coders</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Engineering</topic><topic>Language</topic><topic>Mechatronics</topic><topic>Methods</topic><topic>Multilingualism</topic><topic>Natural language processing</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Robotics</topic><topic>Semantics</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bao, Hangbo</creatorcontrib><creatorcontrib>Dong, Li</creatorcontrib><creatorcontrib>Wang, Wenhui</creatorcontrib><creatorcontrib>Yang, Nan</creatorcontrib><creatorcontrib>Piao, Songhao</creatorcontrib><creatorcontrib>Wei, Furu</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bao, Hangbo</au><au>Dong, Li</au><au>Wang, Wenhui</au><au>Yang, Nan</au><au>Piao, Songhao</au><au>Wei, Furu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. &amp; Cyber</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>15</volume><issue>5</issue><spage>1711</spage><epage>1728</epage><pages>1711-1728</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-023-01992-6</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0003-4351-9112</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1868-8071
ispartof International journal of machine learning and cybernetics, 2024-05, Vol.15 (5), p.1711-1728
issn 1868-8071
1868-808X
language eng
recordid cdi_proquest_journals_3037835019
source Springer Link
subjects Algorithms
Artificial Intelligence
Coders
Complex Systems
Computational Intelligence
Control
Engineering
Language
Mechatronics
Methods
Multilingualism
Natural language processing
Original Article
Pattern Recognition
Robotics
Semantics
Systems Biology
title Fine-tuning pretrained transformer encoders for sequence-to-sequence learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T15%3A25%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fine-tuning%20pretrained%20transformer%20encoders%20for%20sequence-to-sequence%20learning&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Bao,%20Hangbo&rft.date=2024-05-01&rft.volume=15&rft.issue=5&rft.spage=1711&rft.epage=1728&rft.pages=1711-1728&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-023-01992-6&rft_dat=%3Cproquest_cross%3E3037835019%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3037706553&rft_id=info:pmid/&rfr_iscdi=true