Loading…

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive a...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2022-06
Main Authors: Yao, Yuan, Dong, Qingxiu, Guan, Jian, Cao, Boxi, Zhang, Zhengyan, Xiao, Chaojun, Wang, Xiaozhi, Fanchao Qi, Bao, Junwei, Nie, Jinran, Zeng, Zheni, Gu, Yuxian, Zhou, Kun, Huang, Xuancheng, Li, Wenhao, Ren, Shuhuai, Lu, Jinliang, Xu, Chengqiang, Wang, Huadong, Zeng, Guoyang, Zhou, Zile, Zhang, Jiajun, Li, Juanzi, Huang, Minlie, Yan, Rui, He, Xiaodong, Wan, Xiaojun, Zhao, Xin, Xu, Sun, Liu, Yang, Liu, Zhiyuan, Han, Xianpei, Yang, Erhong, Sui, Zhifang, Sun, Maosong
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic. To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework. To facilitate CUGE, we provide a public leaderboard that can be customized to support flexible model judging criteria. Evaluation results on representative pre-trained language models indicate ample room for improvement towards general-purpose language intelligence. CUGE is publicly available at cuge.baai.ac.cn.
ISSN:2331-8422