Loading…

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these model...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-12
Main Authors: de Wynter, Adrian, Watts, Ishaan, Wongsangaroonsri, Tua, Zhang, Minghui, Farra, Noura, Nektar Ege Altıntoprak, Baur, Lena, Claudet, Samantha, Gajdusek, Pavel, Gören, Can, Gu, Qilong, Kaminska, Anna, Kaminski, Tomasz, Kuo, Ruby, Kyuba, Akiko, Lee, Jongho, Mathur, Kartik, Merok, Petter, Milovanović, Ivana, Paananen, Nani, Paananen, Vesa-Matti, Pavlenko, Anna, Bruno Pereira Vidal, Strika, Luciano, Tsao, Yueh, Turcato, Davide, Vakhno, Oleksandr, Velcsov, Judit, Vickers, Anna, Visser, Stéphanie, Widarmanto, Herdyan, Zaikin, Andrey, Si-Qing, Chen
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator de Wynter, Adrian
Watts, Ishaan
Wongsangaroonsri, Tua
Zhang, Minghui
Farra, Noura
Nektar Ege Altıntoprak
Baur, Lena
Claudet, Samantha
Gajdusek, Pavel
Gören, Can
Gu, Qilong
Kaminska, Anna
Kaminski, Tomasz
Kuo, Ruby
Kyuba, Akiko
Lee, Jongho
Mathur, Kartik
Merok, Petter
Milovanović, Ivana
Paananen, Nani
Paananen, Vesa-Matti
Pavlenko, Anna
Bruno Pereira Vidal
Strika, Luciano
Tsao, Yueh
Turcato, Davide
Vakhno, Oleksandr
Velcsov, Judit
Vickers, Anna
Visser, Stéphanie
Widarmanto, Herdyan
Zaikin, Andrey
Si-Qing, Chen
description Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3044070989</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3044070989</sourcerecordid><originalsourceid>FETCH-proquest_journals_30440709893</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwDQoJ0PWJsFJwTsxT8PHxLVZwLUvMKU0sSVUIya_ITM4sqVTIzFPwLc0pyczJzEsvTcxRCE5OzUssyswvtudhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXhjAxMTA3MDSwtLY-JUAQC5hzat</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3044070989</pqid></control><display><type>article</type><title>RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?</title><source>Publicly Available Content Database</source><creator>de Wynter, Adrian ; Watts, Ishaan ; Wongsangaroonsri, Tua ; Zhang, Minghui ; Farra, Noura ; Nektar Ege Altıntoprak ; Baur, Lena ; Claudet, Samantha ; Gajdusek, Pavel ; Gören, Can ; Gu, Qilong ; Kaminska, Anna ; Kaminski, Tomasz ; Kuo, Ruby ; Kyuba, Akiko ; Lee, Jongho ; Mathur, Kartik ; Merok, Petter ; Milovanović, Ivana ; Paananen, Nani ; Paananen, Vesa-Matti ; Pavlenko, Anna ; Bruno Pereira Vidal ; Strika, Luciano ; Tsao, Yueh ; Turcato, Davide ; Vakhno, Oleksandr ; Velcsov, Judit ; Vickers, Anna ; Visser, Stéphanie ; Widarmanto, Herdyan ; Zaikin, Andrey ; Si-Qing, Chen</creator><creatorcontrib>de Wynter, Adrian ; Watts, Ishaan ; Wongsangaroonsri, Tua ; Zhang, Minghui ; Farra, Noura ; Nektar Ege Altıntoprak ; Baur, Lena ; Claudet, Samantha ; Gajdusek, Pavel ; Gören, Can ; Gu, Qilong ; Kaminska, Anna ; Kaminski, Tomasz ; Kuo, Ruby ; Kyuba, Akiko ; Lee, Jongho ; Mathur, Kartik ; Merok, Petter ; Milovanović, Ivana ; Paananen, Nani ; Paananen, Vesa-Matti ; Pavlenko, Anna ; Bruno Pereira Vidal ; Strika, Luciano ; Tsao, Yueh ; Turcato, Davide ; Vakhno, Oleksandr ; Velcsov, Judit ; Vickers, Anna ; Visser, Stéphanie ; Widarmanto, Herdyan ; Zaikin, Andrey ; Si-Qing, Chen</creatorcontrib><description>Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Large language models ; Multilingualism ; Safety ; Toxicity</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3044070989?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>de Wynter, Adrian</creatorcontrib><creatorcontrib>Watts, Ishaan</creatorcontrib><creatorcontrib>Wongsangaroonsri, Tua</creatorcontrib><creatorcontrib>Zhang, Minghui</creatorcontrib><creatorcontrib>Farra, Noura</creatorcontrib><creatorcontrib>Nektar Ege Altıntoprak</creatorcontrib><creatorcontrib>Baur, Lena</creatorcontrib><creatorcontrib>Claudet, Samantha</creatorcontrib><creatorcontrib>Gajdusek, Pavel</creatorcontrib><creatorcontrib>Gören, Can</creatorcontrib><creatorcontrib>Gu, Qilong</creatorcontrib><creatorcontrib>Kaminska, Anna</creatorcontrib><creatorcontrib>Kaminski, Tomasz</creatorcontrib><creatorcontrib>Kuo, Ruby</creatorcontrib><creatorcontrib>Kyuba, Akiko</creatorcontrib><creatorcontrib>Lee, Jongho</creatorcontrib><creatorcontrib>Mathur, Kartik</creatorcontrib><creatorcontrib>Merok, Petter</creatorcontrib><creatorcontrib>Milovanović, Ivana</creatorcontrib><creatorcontrib>Paananen, Nani</creatorcontrib><creatorcontrib>Paananen, Vesa-Matti</creatorcontrib><creatorcontrib>Pavlenko, Anna</creatorcontrib><creatorcontrib>Bruno Pereira Vidal</creatorcontrib><creatorcontrib>Strika, Luciano</creatorcontrib><creatorcontrib>Tsao, Yueh</creatorcontrib><creatorcontrib>Turcato, Davide</creatorcontrib><creatorcontrib>Vakhno, Oleksandr</creatorcontrib><creatorcontrib>Velcsov, Judit</creatorcontrib><creatorcontrib>Vickers, Anna</creatorcontrib><creatorcontrib>Visser, Stéphanie</creatorcontrib><creatorcontrib>Widarmanto, Herdyan</creatorcontrib><creatorcontrib>Zaikin, Andrey</creatorcontrib><creatorcontrib>Si-Qing, Chen</creatorcontrib><title>RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?</title><title>arXiv.org</title><description>Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.</description><subject>Large language models</subject><subject>Multilingualism</subject><subject>Safety</subject><subject>Toxicity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwDQoJ0PWJsFJwTsxT8PHxLVZwLUvMKU0sSVUIya_ITM4sqVTIzFPwLc0pyczJzEsvTcxRCE5OzUssyswvtudhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXhjAxMTA3MDSwtLY-JUAQC5hzat</recordid><startdate>20241216</startdate><enddate>20241216</enddate><creator>de Wynter, Adrian</creator><creator>Watts, Ishaan</creator><creator>Wongsangaroonsri, Tua</creator><creator>Zhang, Minghui</creator><creator>Farra, Noura</creator><creator>Nektar Ege Altıntoprak</creator><creator>Baur, Lena</creator><creator>Claudet, Samantha</creator><creator>Gajdusek, Pavel</creator><creator>Gören, Can</creator><creator>Gu, Qilong</creator><creator>Kaminska, Anna</creator><creator>Kaminski, Tomasz</creator><creator>Kuo, Ruby</creator><creator>Kyuba, Akiko</creator><creator>Lee, Jongho</creator><creator>Mathur, Kartik</creator><creator>Merok, Petter</creator><creator>Milovanović, Ivana</creator><creator>Paananen, Nani</creator><creator>Paananen, Vesa-Matti</creator><creator>Pavlenko, Anna</creator><creator>Bruno Pereira Vidal</creator><creator>Strika, Luciano</creator><creator>Tsao, Yueh</creator><creator>Turcato, Davide</creator><creator>Vakhno, Oleksandr</creator><creator>Velcsov, Judit</creator><creator>Vickers, Anna</creator><creator>Visser, Stéphanie</creator><creator>Widarmanto, Herdyan</creator><creator>Zaikin, Andrey</creator><creator>Si-Qing, Chen</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241216</creationdate><title>RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?</title><author>de Wynter, Adrian ; Watts, Ishaan ; Wongsangaroonsri, Tua ; Zhang, Minghui ; Farra, Noura ; Nektar Ege Altıntoprak ; Baur, Lena ; Claudet, Samantha ; Gajdusek, Pavel ; Gören, Can ; Gu, Qilong ; Kaminska, Anna ; Kaminski, Tomasz ; Kuo, Ruby ; Kyuba, Akiko ; Lee, Jongho ; Mathur, Kartik ; Merok, Petter ; Milovanović, Ivana ; Paananen, Nani ; Paananen, Vesa-Matti ; Pavlenko, Anna ; Bruno Pereira Vidal ; Strika, Luciano ; Tsao, Yueh ; Turcato, Davide ; Vakhno, Oleksandr ; Velcsov, Judit ; Vickers, Anna ; Visser, Stéphanie ; Widarmanto, Herdyan ; Zaikin, Andrey ; Si-Qing, Chen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30440709893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Large language models</topic><topic>Multilingualism</topic><topic>Safety</topic><topic>Toxicity</topic><toplevel>online_resources</toplevel><creatorcontrib>de Wynter, Adrian</creatorcontrib><creatorcontrib>Watts, Ishaan</creatorcontrib><creatorcontrib>Wongsangaroonsri, Tua</creatorcontrib><creatorcontrib>Zhang, Minghui</creatorcontrib><creatorcontrib>Farra, Noura</creatorcontrib><creatorcontrib>Nektar Ege Altıntoprak</creatorcontrib><creatorcontrib>Baur, Lena</creatorcontrib><creatorcontrib>Claudet, Samantha</creatorcontrib><creatorcontrib>Gajdusek, Pavel</creatorcontrib><creatorcontrib>Gören, Can</creatorcontrib><creatorcontrib>Gu, Qilong</creatorcontrib><creatorcontrib>Kaminska, Anna</creatorcontrib><creatorcontrib>Kaminski, Tomasz</creatorcontrib><creatorcontrib>Kuo, Ruby</creatorcontrib><creatorcontrib>Kyuba, Akiko</creatorcontrib><creatorcontrib>Lee, Jongho</creatorcontrib><creatorcontrib>Mathur, Kartik</creatorcontrib><creatorcontrib>Merok, Petter</creatorcontrib><creatorcontrib>Milovanović, Ivana</creatorcontrib><creatorcontrib>Paananen, Nani</creatorcontrib><creatorcontrib>Paananen, Vesa-Matti</creatorcontrib><creatorcontrib>Pavlenko, Anna</creatorcontrib><creatorcontrib>Bruno Pereira Vidal</creatorcontrib><creatorcontrib>Strika, Luciano</creatorcontrib><creatorcontrib>Tsao, Yueh</creatorcontrib><creatorcontrib>Turcato, Davide</creatorcontrib><creatorcontrib>Vakhno, Oleksandr</creatorcontrib><creatorcontrib>Velcsov, Judit</creatorcontrib><creatorcontrib>Vickers, Anna</creatorcontrib><creatorcontrib>Visser, Stéphanie</creatorcontrib><creatorcontrib>Widarmanto, Herdyan</creatorcontrib><creatorcontrib>Zaikin, Andrey</creatorcontrib><creatorcontrib>Si-Qing, Chen</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>de Wynter, Adrian</au><au>Watts, Ishaan</au><au>Wongsangaroonsri, Tua</au><au>Zhang, Minghui</au><au>Farra, Noura</au><au>Nektar Ege Altıntoprak</au><au>Baur, Lena</au><au>Claudet, Samantha</au><au>Gajdusek, Pavel</au><au>Gören, Can</au><au>Gu, Qilong</au><au>Kaminska, Anna</au><au>Kaminski, Tomasz</au><au>Kuo, Ruby</au><au>Kyuba, Akiko</au><au>Lee, Jongho</au><au>Mathur, Kartik</au><au>Merok, Petter</au><au>Milovanović, Ivana</au><au>Paananen, Nani</au><au>Paananen, Vesa-Matti</au><au>Pavlenko, Anna</au><au>Bruno Pereira Vidal</au><au>Strika, Luciano</au><au>Tsao, Yueh</au><au>Turcato, Davide</au><au>Vakhno, Oleksandr</au><au>Velcsov, Judit</au><au>Vickers, Anna</au><au>Visser, Stéphanie</au><au>Widarmanto, Herdyan</au><au>Zaikin, Andrey</au><au>Si-Qing, Chen</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?</atitle><jtitle>arXiv.org</jtitle><date>2024-12-16</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_3044070989
source Publicly Available Content Database
subjects Large language models
Multilingualism
Safety
Toxicity
title RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T02%3A38%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=RTP-LX:%20Can%20LLMs%20Evaluate%20Toxicity%20in%20Multilingual%20Scenarios?&rft.jtitle=arXiv.org&rft.au=de%20Wynter,%20Adrian&rft.date=2024-12-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3044070989%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30440709893%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3044070989&rft_id=info:pmid/&rfr_iscdi=true