Loading…

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and rei...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-08
Main Authors: Lai, Viet Dac, Chien Van Nguyen, Ngo, Nghia Trung, Nguyen, Thuat, Dernoncourt, Franck, Rossi, Ryan A, Nguyen, Thien Huu
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Lai, Viet Dac
Chien Van Nguyen
Ngo, Nghia Trung
Nguyen, Thuat
Dernoncourt, Franck
Rossi, Ryan A
Nguyen, Thien Huu
description A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2844450496</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2844450496</sourcerecordid><originalsourceid>FETCH-proquest_journals_28444504963</originalsourceid><addsrcrecordid>eNqNissKwjAQRYMgKOo_DLgu1DT1tRVFoUUQ9yW20xpNJzUP_H27ENy6uefAuQM25kmyiNaC8xGbOfeI45gvVzxNkzHz56fs1BZO5LwNpVeGIh8IK8ikbbBfaoLsJTcVageKIA_aq07_moO38ne4oKLa2BJbJA8ZSkuKGqitaeEYWklwQKxusnxO2bCW2uHsywmbH_bX3THqrHkFdL54mGCpTwVfCyHSWGyWyX-vDw42TC4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2844450496</pqid></control><display><type>article</type><title>Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback</title><source>ProQuest - Publicly Available Content Database</source><creator>Lai, Viet Dac ; Chien Van Nguyen ; Ngo, Nghia Trung ; Nguyen, Thuat ; Dernoncourt, Franck ; Rossi, Ryan A ; Nguyen, Thien Huu</creator><creatorcontrib>Lai, Viet Dac ; Chien Van Nguyen ; Ngo, Nghia Trung ; Nguyen, Thuat ; Dernoncourt, Franck ; Rossi, Ryan A ; Nguyen, Thien Huu</creatorcontrib><description>A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Accessibility ; Datasets ; Feedback ; Languages ; Large language models ; Machine learning ; Multilingualism ; R&amp;D ; Research &amp; development</subject><ispartof>arXiv.org, 2023-08</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2844450496?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Lai, Viet Dac</creatorcontrib><creatorcontrib>Chien Van Nguyen</creatorcontrib><creatorcontrib>Ngo, Nghia Trung</creatorcontrib><creatorcontrib>Nguyen, Thuat</creatorcontrib><creatorcontrib>Dernoncourt, Franck</creatorcontrib><creatorcontrib>Rossi, Ryan A</creatorcontrib><creatorcontrib>Nguyen, Thien Huu</creatorcontrib><title>Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback</title><title>arXiv.org</title><description>A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.</description><subject>Accessibility</subject><subject>Datasets</subject><subject>Feedback</subject><subject>Languages</subject><subject>Large language models</subject><subject>Machine learning</subject><subject>Multilingualism</subject><subject>R&amp;D</subject><subject>Research &amp; development</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNissKwjAQRYMgKOo_DLgu1DT1tRVFoUUQ9yW20xpNJzUP_H27ENy6uefAuQM25kmyiNaC8xGbOfeI45gvVzxNkzHz56fs1BZO5LwNpVeGIh8IK8ikbbBfaoLsJTcVageKIA_aq07_moO38ne4oKLa2BJbJA8ZSkuKGqitaeEYWklwQKxusnxO2bCW2uHsywmbH_bX3THqrHkFdL54mGCpTwVfCyHSWGyWyX-vDw42TC4</recordid><startdate>20230802</startdate><enddate>20230802</enddate><creator>Lai, Viet Dac</creator><creator>Chien Van Nguyen</creator><creator>Ngo, Nghia Trung</creator><creator>Nguyen, Thuat</creator><creator>Dernoncourt, Franck</creator><creator>Rossi, Ryan A</creator><creator>Nguyen, Thien Huu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230802</creationdate><title>Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback</title><author>Lai, Viet Dac ; Chien Van Nguyen ; Ngo, Nghia Trung ; Nguyen, Thuat ; Dernoncourt, Franck ; Rossi, Ryan A ; Nguyen, Thien Huu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28444504963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accessibility</topic><topic>Datasets</topic><topic>Feedback</topic><topic>Languages</topic><topic>Large language models</topic><topic>Machine learning</topic><topic>Multilingualism</topic><topic>R&amp;D</topic><topic>Research &amp; development</topic><toplevel>online_resources</toplevel><creatorcontrib>Lai, Viet Dac</creatorcontrib><creatorcontrib>Chien Van Nguyen</creatorcontrib><creatorcontrib>Ngo, Nghia Trung</creatorcontrib><creatorcontrib>Nguyen, Thuat</creatorcontrib><creatorcontrib>Dernoncourt, Franck</creatorcontrib><creatorcontrib>Rossi, Ryan A</creatorcontrib><creatorcontrib>Nguyen, Thien Huu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest - Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lai, Viet Dac</au><au>Chien Van Nguyen</au><au>Ngo, Nghia Trung</au><au>Nguyen, Thuat</au><au>Dernoncourt, Franck</au><au>Rossi, Ryan A</au><au>Nguyen, Thien Huu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback</atitle><jtitle>arXiv.org</jtitle><date>2023-08-02</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-08
issn 2331-8422
language eng
recordid cdi_proquest_journals_2844450496
source ProQuest - Publicly Available Content Database
subjects Accessibility
Datasets
Feedback
Languages
Large language models
Machine learning
Multilingualism
R&D
Research & development
title Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T04%3A45%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Okapi:%20Instruction-tuned%20Large%20Language%20Models%20in%20Multiple%20Languages%20with%20Reinforcement%20Learning%20from%20Human%20Feedback&rft.jtitle=arXiv.org&rft.au=Lai,%20Viet%20Dac&rft.date=2023-08-02&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2844450496%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28444504963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2844450496&rft_id=info:pmid/&rfr_iscdi=true