Loading…

Fine-tuning pretrained transformer encoders for sequence-to-sequence learning

In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of machine learning and cybernetics 2024-05, Vol.15 (5), p.1711-1728
Main Authors:	Bao, Hangbo, Dong, Li, Wang, Wenhui, Yang, Nan, Piao, Songhao, Wei, Furu
Format:	Article
Language:	English
Subjects:	Algorithms Artificial Intelligence Coders Complex Systems Computational Intelligence Control Engineering Language Mechatronics Methods Multilingualism Natural language processing Original Article Pattern Recognition Robotics Semantics Systems Biology
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3
cites	cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3
container_end_page	1728
container_issue	5
container_start_page	1711
container_title	International journal of machine learning and cybernetics
container_volume	15
creator	Bao, Hangbo Dong, Li Wang, Wenhui Yang, Nan Piao, Songhao Wei, Furu
description	In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.
doi_str_mv	10.1007/s13042-023-01992-6
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3037835019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3037835019</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</originalsourceid><addsrcrecordid>eNqFkE9LAzEQxYMoWLRfwNOC5-gk002yRylWhYoXBW8hm05KS5utyfbgtzd1_XPTucxkeO9l-DF2IeBKAOjrLBAmkoNEDqJpJFdHbCSMMtyAeT3-mbU4ZeOc11BKASLIEXucrSLxfh9XcVntEvXJlcWiKj3m0KUtpYqi7xaUclXeVaa3fVkUT8e_52pDLh0SztlJcJtM469-xl5mt8_Tez5_unuY3sy5x4nueeMbQk_QErg2aJITJU0gp8D4BTUiCHR1643UQTgKonZNQwrbINpaBuXxjF0OubvUlRNyb9fdPsXypUVAbbAuIP5TaVB1jUUlB5VPXc6Jgt2l1daldyvAHvjaga8tfO0nX6uKCQdTLuK4pPQb_YfrA7HPfmU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3037706553</pqid></control><display><type>article</type><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><source>Springer Link</source><creator>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</creator><creatorcontrib>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</creatorcontrib><description>In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-023-01992-6</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Artificial Intelligence ; Coders ; Complex Systems ; Computational Intelligence ; Control ; Engineering ; Language ; Mechatronics ; Methods ; Multilingualism ; Natural language processing ; Original Article ; Pattern Recognition ; Robotics ; Semantics ; Systems Biology</subject><ispartof>International journal of machine learning and cybernetics, 2024-05, Vol.15 (5), p.1711-1728</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</citedby><cites>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</cites><orcidid>0000-0003-4351-9112</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Bao, Hangbo</creatorcontrib><creatorcontrib>Dong, Li</creatorcontrib><creatorcontrib>Wang, Wenhui</creatorcontrib><creatorcontrib>Yang, Nan</creatorcontrib><creatorcontrib>Piao, Songhao</creatorcontrib><creatorcontrib>Wei, Furu</creatorcontrib><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. & Cyber</addtitle><description>In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Coders</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Engineering</subject><subject>Language</subject><subject>Mechatronics</subject><subject>Methods</subject><subject>Multilingualism</subject><subject>Natural language processing</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Robotics</subject><subject>Semantics</subject><subject>Systems Biology</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkE9LAzEQxYMoWLRfwNOC5-gk002yRylWhYoXBW8hm05KS5utyfbgtzd1_XPTucxkeO9l-DF2IeBKAOjrLBAmkoNEDqJpJFdHbCSMMtyAeT3-mbU4ZeOc11BKASLIEXucrSLxfh9XcVntEvXJlcWiKj3m0KUtpYqi7xaUclXeVaa3fVkUT8e_52pDLh0SztlJcJtM469-xl5mt8_Tez5_unuY3sy5x4nueeMbQk_QErg2aJITJU0gp8D4BTUiCHR1643UQTgKonZNQwrbINpaBuXxjF0OubvUlRNyb9fdPsXypUVAbbAuIP5TaVB1jUUlB5VPXc6Jgt2l1daldyvAHvjaga8tfO0nX6uKCQdTLuK4pPQb_YfrA7HPfmU</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Bao, Hangbo</creator><creator>Dong, Li</creator><creator>Wang, Wenhui</creator><creator>Yang, Nan</creator><creator>Piao, Songhao</creator><creator>Wei, Furu</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0003-4351-9112</orcidid></search><sort><creationdate>20240501</creationdate><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><author>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Coders</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Engineering</topic><topic>Language</topic><topic>Mechatronics</topic><topic>Methods</topic><topic>Multilingualism</topic><topic>Natural language processing</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Robotics</topic><topic>Semantics</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bao, Hangbo</creatorcontrib><creatorcontrib>Dong, Li</creatorcontrib><creatorcontrib>Wang, Wenhui</creatorcontrib><creatorcontrib>Yang, Nan</creatorcontrib><creatorcontrib>Piao, Songhao</creatorcontrib><creatorcontrib>Wei, Furu</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bao, Hangbo</au><au>Dong, Li</au><au>Wang, Wenhui</au><au>Yang, Nan</au><au>Piao, Songhao</au><au>Wei, Furu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. & Cyber</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>15</volume><issue>5</issue><spage>1711</spage><epage>1728</epage><pages>1711-1728</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-023-01992-6</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0003-4351-9112</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1868-8071
ispartof	International journal of machine learning and cybernetics, 2024-05, Vol.15 (5), p.1711-1728
issn	1868-8071 1868-808X
language	eng
recordid	cdi_proquest_journals_3037835019
source	Springer Link
subjects	Algorithms Artificial Intelligence Coders Complex Systems Computational Intelligence Control Engineering Language Mechatronics Methods Multilingualism Natural language processing Original Article Pattern Recognition Robotics Semantics Systems Biology
title	Fine-tuning pretrained transformer encoders for sequence-to-sequence learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T15%3A25%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fine-tuning%20pretrained%20transformer%20encoders%20for%20sequence-to-sequence%20learning&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Bao,%20Hangbo&rft.date=2024-05-01&rft.volume=15&rft.issue=5&rft.spage=1711&rft.epage=1728&rft.pages=1711-1728&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-023-01992-6&rft_dat=%3Cproquest_cross%3E3037835019%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3037706553&rft_id=info:pmid/&rfr_iscdi=true