Loading…
Fine-tuning pretrained transformer encoders for sequence-to-sequence learning
In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s...
Saved in:
Published in: | International journal of machine learning and cybernetics 2024-05, Vol.15 (5), p.1711-1728 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3 |
---|---|
cites | cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3 |
container_end_page | 1728 |
container_issue | 5 |
container_start_page | 1711 |
container_title | International journal of machine learning and cybernetics |
container_volume | 15 |
creator | Bao, Hangbo Dong, Li Wang, Wenhui Yang, Nan Piao, Songhao Wei, Furu |
description | In this paper, we introduce
s2s-ft
, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks,
s2s-ft
leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that
s2s-ft
achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the
s2s-ft
method. |
doi_str_mv | 10.1007/s13042-023-01992-6 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3037835019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3037835019</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</originalsourceid><addsrcrecordid>eNqFkE9LAzEQxYMoWLRfwNOC5-gk002yRylWhYoXBW8hm05KS5utyfbgtzd1_XPTucxkeO9l-DF2IeBKAOjrLBAmkoNEDqJpJFdHbCSMMtyAeT3-mbU4ZeOc11BKASLIEXucrSLxfh9XcVntEvXJlcWiKj3m0KUtpYqi7xaUclXeVaa3fVkUT8e_52pDLh0SztlJcJtM469-xl5mt8_Tez5_unuY3sy5x4nueeMbQk_QErg2aJITJU0gp8D4BTUiCHR1643UQTgKonZNQwrbINpaBuXxjF0OubvUlRNyb9fdPsXypUVAbbAuIP5TaVB1jUUlB5VPXc6Jgt2l1daldyvAHvjaga8tfO0nX6uKCQdTLuK4pPQb_YfrA7HPfmU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3037706553</pqid></control><display><type>article</type><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><source>Springer Link</source><creator>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</creator><creatorcontrib>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</creatorcontrib><description>In this paper, we introduce
s2s-ft
, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks,
s2s-ft
leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that
s2s-ft
achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the
s2s-ft
method.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-023-01992-6</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Artificial Intelligence ; Coders ; Complex Systems ; Computational Intelligence ; Control ; Engineering ; Language ; Mechatronics ; Methods ; Multilingualism ; Natural language processing ; Original Article ; Pattern Recognition ; Robotics ; Semantics ; Systems Biology</subject><ispartof>International journal of machine learning and cybernetics, 2024-05, Vol.15 (5), p.1711-1728</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</citedby><cites>FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</cites><orcidid>0000-0003-4351-9112</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Bao, Hangbo</creatorcontrib><creatorcontrib>Dong, Li</creatorcontrib><creatorcontrib>Wang, Wenhui</creatorcontrib><creatorcontrib>Yang, Nan</creatorcontrib><creatorcontrib>Piao, Songhao</creatorcontrib><creatorcontrib>Wei, Furu</creatorcontrib><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. & Cyber</addtitle><description>In this paper, we introduce
s2s-ft
, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks,
s2s-ft
leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that
s2s-ft
achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the
s2s-ft
method.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Coders</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Engineering</subject><subject>Language</subject><subject>Mechatronics</subject><subject>Methods</subject><subject>Multilingualism</subject><subject>Natural language processing</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Robotics</subject><subject>Semantics</subject><subject>Systems Biology</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkE9LAzEQxYMoWLRfwNOC5-gk002yRylWhYoXBW8hm05KS5utyfbgtzd1_XPTucxkeO9l-DF2IeBKAOjrLBAmkoNEDqJpJFdHbCSMMtyAeT3-mbU4ZeOc11BKASLIEXucrSLxfh9XcVntEvXJlcWiKj3m0KUtpYqi7xaUclXeVaa3fVkUT8e_52pDLh0SztlJcJtM469-xl5mt8_Tez5_unuY3sy5x4nueeMbQk_QErg2aJITJU0gp8D4BTUiCHR1643UQTgKonZNQwrbINpaBuXxjF0OubvUlRNyb9fdPsXypUVAbbAuIP5TaVB1jUUlB5VPXc6Jgt2l1daldyvAHvjaga8tfO0nX6uKCQdTLuK4pPQb_YfrA7HPfmU</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Bao, Hangbo</creator><creator>Dong, Li</creator><creator>Wang, Wenhui</creator><creator>Yang, Nan</creator><creator>Piao, Songhao</creator><creator>Wei, Furu</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0003-4351-9112</orcidid></search><sort><creationdate>20240501</creationdate><title>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</title><author>Bao, Hangbo ; Dong, Li ; Wang, Wenhui ; Yang, Nan ; Piao, Songhao ; Wei, Furu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Coders</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Engineering</topic><topic>Language</topic><topic>Mechatronics</topic><topic>Methods</topic><topic>Multilingualism</topic><topic>Natural language processing</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Robotics</topic><topic>Semantics</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bao, Hangbo</creatorcontrib><creatorcontrib>Dong, Li</creatorcontrib><creatorcontrib>Wang, Wenhui</creatorcontrib><creatorcontrib>Yang, Nan</creatorcontrib><creatorcontrib>Piao, Songhao</creatorcontrib><creatorcontrib>Wei, Furu</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bao, Hangbo</au><au>Dong, Li</au><au>Wang, Wenhui</au><au>Yang, Nan</au><au>Piao, Songhao</au><au>Wei, Furu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fine-tuning pretrained transformer encoders for sequence-to-sequence learning</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. & Cyber</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>15</volume><issue>5</issue><spage>1711</spage><epage>1728</epage><pages>1711-1728</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>In this paper, we introduce
s2s-ft
, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks,
s2s-ft
leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that
s2s-ft
achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the
s2s-ft
method.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-023-01992-6</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0003-4351-9112</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1868-8071 |
ispartof | International journal of machine learning and cybernetics, 2024-05, Vol.15 (5), p.1711-1728 |
issn | 1868-8071 1868-808X |
language | eng |
recordid | cdi_proquest_journals_3037835019 |
source | Springer Link |
subjects | Algorithms Artificial Intelligence Coders Complex Systems Computational Intelligence Control Engineering Language Mechatronics Methods Multilingualism Natural language processing Original Article Pattern Recognition Robotics Semantics Systems Biology |
title | Fine-tuning pretrained transformer encoders for sequence-to-sequence learning |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T15%3A25%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fine-tuning%20pretrained%20transformer%20encoders%20for%20sequence-to-sequence%20learning&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Bao,%20Hangbo&rft.date=2024-05-01&rft.volume=15&rft.issue=5&rft.spage=1711&rft.epage=1728&rft.pages=1711-1728&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-023-01992-6&rft_dat=%3Cproquest_cross%3E3037835019%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c347t-9c9e3ce0be0abf7e24628fea608cde91f13a5bc827f1aef15a99e63bf1b52f6c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3037706553&rft_id=info:pmid/&rfr_iscdi=true |