Loading…

TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness eval...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2021-05
Main Authors:	Gui, Tao, Wang, Xiao, Zhang, Qi, Liu, Qin, Zou, Yicheng, Zhou, Xin, Zheng, Rui, Zhang, Chong, Wu, Qinzhuo, Ye, Jiacheng, Pang, Zexiong, Zhang, Yongxin, Li, Zhengyan, Ma, Ruotian, Zichu Fei, Cai, Ruijian, Zhao, Jun, Hu, Xingwu, Yan, Zhiheng, Tan, Yiding, Hu, Yuan, Bian, Qiyuan, Liu, Zhihua, Zhu, Bolin, Qin, Shan, Xing, Xiaoyu, Fu, Jinlan, Zhang, Yue, Peng, Minlong, Zheng, Xiaoqing, Zhou, Yaqian, Wei, Zhongyu, Qiu, Xipeng, Huang, Xuanjing
Format:	Article
Language:	English
Subjects:	Empirical analysis Machine learning Multilingualism Natural language processing Performance degradation Robustness State-of-the-art reviews Transformations
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Gui, Tao Wang, Xiao Zhang, Qi Liu, Qin Zou, Yicheng Zhou, Xin Zheng, Rui Zhang, Chong Wu, Qinzhuo Ye, Jiacheng Pang, Zexiong Zhang, Yongxin Li, Zhengyan Ma, Ruotian Zichu Fei Cai, Ruijian Zhao, Jun Hu, Xingwu Yan, Zhiheng Tan, Yiding Hu, Yuan Bian, Qiyuan Liu, Zhihua Zhu, Bolin Qin, Shan Xing, Xiaoyu Fu, Jinlan Zhang, Yue Peng, Minlong Zheng, Xiaoqing Zhou, Yaqian Wei, Zhongyu Qiu, Xipeng Huang, Xuanjing
description	Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint enables practitioners to automatically evaluate their models from all aspects or to customize their evaluations as desired with just a few lines of code. To guarantee user acceptability, all the text transformations are linguistically based, and we provide a human evaluation for each one. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness. To validate TextFlint's utility, we performed large-scale empirical evaluations (over 67,000 evaluations) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. Almost all models showed significant performance degradation, including a decline of more than 50% of BERT's prediction accuracy on tasks such as aspect-level sentiment classification, named entity recognition, and natural language inference. Therefore, we call for the robustness to be included in the model evaluation, so as to promote the healthy development of NLP technology.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2504120433</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2504120433</sourcerecordid><originalsourceid>FETCH-proquest_journals_25041204333</originalsourceid><addsrcrecordid>eNqNjMsKwjAURIMgWLT_cMF1ISatiltpcaEiUpdSoqYlNeRqHuLnG8EPcDVw5swMSMI4n2XLnLERSZ3rKaVsvmBFwRNyruXbV1oZv4KTUa2SN9gF7VVEXRAajngJzhvpHJQvoYPwCg3UiPquPLRoYS98sNHciu-ik3CweI1-PJiQYSu0k-kvx2RalfV6kz0sPoN0vukxWBOrhhU0nzGac87_sz5q0kRe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2504120433</pqid></control><display><type>article</type><title>TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing</title><source>Access via ProQuest (Open Access)</source><creator>Gui, Tao ; Wang, Xiao ; Zhang, Qi ; Liu, Qin ; Zou, Yicheng ; Zhou, Xin ; Zheng, Rui ; Zhang, Chong ; Wu, Qinzhuo ; Ye, Jiacheng ; Pang, Zexiong ; Zhang, Yongxin ; Li, Zhengyan ; Ma, Ruotian ; Zichu Fei ; Cai, Ruijian ; Zhao, Jun ; Hu, Xingwu ; Yan, Zhiheng ; Tan, Yiding ; Hu, Yuan ; Bian, Qiyuan ; Liu, Zhihua ; Zhu, Bolin ; Qin, Shan ; Xing, Xiaoyu ; Fu, Jinlan ; Zhang, Yue ; Peng, Minlong ; Zheng, Xiaoqing ; Zhou, Yaqian ; Wei, Zhongyu ; Qiu, Xipeng ; Huang, Xuanjing</creator><creatorcontrib>Gui, Tao ; Wang, Xiao ; Zhang, Qi ; Liu, Qin ; Zou, Yicheng ; Zhou, Xin ; Zheng, Rui ; Zhang, Chong ; Wu, Qinzhuo ; Ye, Jiacheng ; Pang, Zexiong ; Zhang, Yongxin ; Li, Zhengyan ; Ma, Ruotian ; Zichu Fei ; Cai, Ruijian ; Zhao, Jun ; Hu, Xingwu ; Yan, Zhiheng ; Tan, Yiding ; Hu, Yuan ; Bian, Qiyuan ; Liu, Zhihua ; Zhu, Bolin ; Qin, Shan ; Xing, Xiaoyu ; Fu, Jinlan ; Zhang, Yue ; Peng, Minlong ; Zheng, Xiaoqing ; Zhou, Yaqian ; Wei, Zhongyu ; Qiu, Xipeng ; Huang, Xuanjing</creatorcontrib><description>Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint enables practitioners to automatically evaluate their models from all aspects or to customize their evaluations as desired with just a few lines of code. To guarantee user acceptability, all the text transformations are linguistically based, and we provide a human evaluation for each one. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness. To validate TextFlint's utility, we performed large-scale empirical evaluations (over 67,000 evaluations) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. Almost all models showed significant performance degradation, including a decline of more than 50% of BERT's prediction accuracy on tasks such as aspect-level sentiment classification, named entity recognition, and natural language inference. Therefore, we call for the robustness to be included in the model evaluation, so as to promote the healthy development of NLP technology.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Empirical analysis ; Machine learning ; Multilingualism ; Natural language processing ; Performance degradation ; Robustness ; State-of-the-art reviews ; Transformations</subject><ispartof>arXiv.org, 2021-05</ispartof><rights>2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2504120433?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Gui, Tao</creatorcontrib><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Zhang, Qi</creatorcontrib><creatorcontrib>Liu, Qin</creatorcontrib><creatorcontrib>Zou, Yicheng</creatorcontrib><creatorcontrib>Zhou, Xin</creatorcontrib><creatorcontrib>Zheng, Rui</creatorcontrib><creatorcontrib>Zhang, Chong</creatorcontrib><creatorcontrib>Wu, Qinzhuo</creatorcontrib><creatorcontrib>Ye, Jiacheng</creatorcontrib><creatorcontrib>Pang, Zexiong</creatorcontrib><creatorcontrib>Zhang, Yongxin</creatorcontrib><creatorcontrib>Li, Zhengyan</creatorcontrib><creatorcontrib>Ma, Ruotian</creatorcontrib><creatorcontrib>Zichu Fei</creatorcontrib><creatorcontrib>Cai, Ruijian</creatorcontrib><creatorcontrib>Zhao, Jun</creatorcontrib><creatorcontrib>Hu, Xingwu</creatorcontrib><creatorcontrib>Yan, Zhiheng</creatorcontrib><creatorcontrib>Tan, Yiding</creatorcontrib><creatorcontrib>Hu, Yuan</creatorcontrib><creatorcontrib>Bian, Qiyuan</creatorcontrib><creatorcontrib>Liu, Zhihua</creatorcontrib><creatorcontrib>Zhu, Bolin</creatorcontrib><creatorcontrib>Qin, Shan</creatorcontrib><creatorcontrib>Xing, Xiaoyu</creatorcontrib><creatorcontrib>Fu, Jinlan</creatorcontrib><creatorcontrib>Zhang, Yue</creatorcontrib><creatorcontrib>Peng, Minlong</creatorcontrib><creatorcontrib>Zheng, Xiaoqing</creatorcontrib><creatorcontrib>Zhou, Yaqian</creatorcontrib><creatorcontrib>Wei, Zhongyu</creatorcontrib><creatorcontrib>Qiu, Xipeng</creatorcontrib><creatorcontrib>Huang, Xuanjing</creatorcontrib><title>TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing</title><title>arXiv.org</title><description>Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint enables practitioners to automatically evaluate their models from all aspects or to customize their evaluations as desired with just a few lines of code. To guarantee user acceptability, all the text transformations are linguistically based, and we provide a human evaluation for each one. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness. To validate TextFlint's utility, we performed large-scale empirical evaluations (over 67,000 evaluations) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. Almost all models showed significant performance degradation, including a decline of more than 50% of BERT's prediction accuracy on tasks such as aspect-level sentiment classification, named entity recognition, and natural language inference. Therefore, we call for the robustness to be included in the model evaluation, so as to promote the healthy development of NLP technology.</description><subject>Empirical analysis</subject><subject>Machine learning</subject><subject>Multilingualism</subject><subject>Natural language processing</subject><subject>Performance degradation</subject><subject>Robustness</subject><subject>State-of-the-art reviews</subject><subject>Transformations</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjMsKwjAURIMgWLT_cMF1ISatiltpcaEiUpdSoqYlNeRqHuLnG8EPcDVw5swMSMI4n2XLnLERSZ3rKaVsvmBFwRNyruXbV1oZv4KTUa2SN9gF7VVEXRAajngJzhvpHJQvoYPwCg3UiPquPLRoYS98sNHciu-ik3CweI1-PJiQYSu0k-kvx2RalfV6kz0sPoN0vukxWBOrhhU0nzGac87_sz5q0kRe</recordid><startdate>20210505</startdate><enddate>20210505</enddate><creator>Gui, Tao</creator><creator>Wang, Xiao</creator><creator>Zhang, Qi</creator><creator>Liu, Qin</creator><creator>Zou, Yicheng</creator><creator>Zhou, Xin</creator><creator>Zheng, Rui</creator><creator>Zhang, Chong</creator><creator>Wu, Qinzhuo</creator><creator>Ye, Jiacheng</creator><creator>Pang, Zexiong</creator><creator>Zhang, Yongxin</creator><creator>Li, Zhengyan</creator><creator>Ma, Ruotian</creator><creator>Zichu Fei</creator><creator>Cai, Ruijian</creator><creator>Zhao, Jun</creator><creator>Hu, Xingwu</creator><creator>Yan, Zhiheng</creator><creator>Tan, Yiding</creator><creator>Hu, Yuan</creator><creator>Bian, Qiyuan</creator><creator>Liu, Zhihua</creator><creator>Zhu, Bolin</creator><creator>Qin, Shan</creator><creator>Xing, Xiaoyu</creator><creator>Fu, Jinlan</creator><creator>Zhang, Yue</creator><creator>Peng, Minlong</creator><creator>Zheng, Xiaoqing</creator><creator>Zhou, Yaqian</creator><creator>Wei, Zhongyu</creator><creator>Qiu, Xipeng</creator><creator>Huang, Xuanjing</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210505</creationdate><title>TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing</title><author>Gui, Tao ; Wang, Xiao ; Zhang, Qi ; Liu, Qin ; Zou, Yicheng ; Zhou, Xin ; Zheng, Rui ; Zhang, Chong ; Wu, Qinzhuo ; Ye, Jiacheng ; Pang, Zexiong ; Zhang, Yongxin ; Li, Zhengyan ; Ma, Ruotian ; Zichu Fei ; Cai, Ruijian ; Zhao, Jun ; Hu, Xingwu ; Yan, Zhiheng ; Tan, Yiding ; Hu, Yuan ; Bian, Qiyuan ; Liu, Zhihua ; Zhu, Bolin ; Qin, Shan ; Xing, Xiaoyu ; Fu, Jinlan ; Zhang, Yue ; Peng, Minlong ; Zheng, Xiaoqing ; Zhou, Yaqian ; Wei, Zhongyu ; Qiu, Xipeng ; Huang, Xuanjing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25041204333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Empirical analysis</topic><topic>Machine learning</topic><topic>Multilingualism</topic><topic>Natural language processing</topic><topic>Performance degradation</topic><topic>Robustness</topic><topic>State-of-the-art reviews</topic><topic>Transformations</topic><toplevel>online_resources</toplevel><creatorcontrib>Gui, Tao</creatorcontrib><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Zhang, Qi</creatorcontrib><creatorcontrib>Liu, Qin</creatorcontrib><creatorcontrib>Zou, Yicheng</creatorcontrib><creatorcontrib>Zhou, Xin</creatorcontrib><creatorcontrib>Zheng, Rui</creatorcontrib><creatorcontrib>Zhang, Chong</creatorcontrib><creatorcontrib>Wu, Qinzhuo</creatorcontrib><creatorcontrib>Ye, Jiacheng</creatorcontrib><creatorcontrib>Pang, Zexiong</creatorcontrib><creatorcontrib>Zhang, Yongxin</creatorcontrib><creatorcontrib>Li, Zhengyan</creatorcontrib><creatorcontrib>Ma, Ruotian</creatorcontrib><creatorcontrib>Zichu Fei</creatorcontrib><creatorcontrib>Cai, Ruijian</creatorcontrib><creatorcontrib>Zhao, Jun</creatorcontrib><creatorcontrib>Hu, Xingwu</creatorcontrib><creatorcontrib>Yan, Zhiheng</creatorcontrib><creatorcontrib>Tan, Yiding</creatorcontrib><creatorcontrib>Hu, Yuan</creatorcontrib><creatorcontrib>Bian, Qiyuan</creatorcontrib><creatorcontrib>Liu, Zhihua</creatorcontrib><creatorcontrib>Zhu, Bolin</creatorcontrib><creatorcontrib>Qin, Shan</creatorcontrib><creatorcontrib>Xing, Xiaoyu</creatorcontrib><creatorcontrib>Fu, Jinlan</creatorcontrib><creatorcontrib>Zhang, Yue</creatorcontrib><creatorcontrib>Peng, Minlong</creatorcontrib><creatorcontrib>Zheng, Xiaoqing</creatorcontrib><creatorcontrib>Zhou, Yaqian</creatorcontrib><creatorcontrib>Wei, Zhongyu</creatorcontrib><creatorcontrib>Qiu, Xipeng</creatorcontrib><creatorcontrib>Huang, Xuanjing</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gui, Tao</au><au>Wang, Xiao</au><au>Zhang, Qi</au><au>Liu, Qin</au><au>Zou, Yicheng</au><au>Zhou, Xin</au><au>Zheng, Rui</au><au>Zhang, Chong</au><au>Wu, Qinzhuo</au><au>Ye, Jiacheng</au><au>Pang, Zexiong</au><au>Zhang, Yongxin</au><au>Li, Zhengyan</au><au>Ma, Ruotian</au><au>Zichu Fei</au><au>Cai, Ruijian</au><au>Zhao, Jun</au><au>Hu, Xingwu</au><au>Yan, Zhiheng</au><au>Tan, Yiding</au><au>Hu, Yuan</au><au>Bian, Qiyuan</au><au>Liu, Zhihua</au><au>Zhu, Bolin</au><au>Qin, Shan</au><au>Xing, Xiaoyu</au><au>Fu, Jinlan</au><au>Zhang, Yue</au><au>Peng, Minlong</au><au>Zheng, Xiaoqing</au><au>Zhou, Yaqian</au><au>Wei, Zhongyu</au><au>Qiu, Xipeng</au><au>Huang, Xuanjing</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing</atitle><jtitle>arXiv.org</jtitle><date>2021-05-05</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint enables practitioners to automatically evaluate their models from all aspects or to customize their evaluations as desired with just a few lines of code. To guarantee user acceptability, all the text transformations are linguistically based, and we provide a human evaluation for each one. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness. To validate TextFlint's utility, we performed large-scale empirical evaluations (over 67,000 evaluations) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. Almost all models showed significant performance degradation, including a decline of more than 50% of BERT's prediction accuracy on tasks such as aspect-level sentiment classification, named entity recognition, and natural language inference. Therefore, we call for the robustness to be included in the model evaluation, so as to promote the healthy development of NLP technology.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2504120433
source	Access via ProQuest (Open Access)
subjects	Empirical analysis Machine learning Multilingualism Natural language processing Performance degradation Robustness State-of-the-art reviews Transformations
title	TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A54%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=TextFlint:%20Unified%20Multilingual%20Robustness%20Evaluation%20Toolkit%20for%20Natural%20Language%20Processing&rft.jtitle=arXiv.org&rft.au=Gui,%20Tao&rft.date=2021-05-05&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2504120433%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_25041204333%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2504120433&rft_id=info:pmid/&rfr_iscdi=true