Loading…

CPM-2: Large-scale Cost-effective Pre-trained Language Models

In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2021-06
Main Authors: Zhang, Zhengyan, Gu, Yuxian, Xu, Han, Chen, Shengqi, Xiao, Chaojun, Sun, Zhenbo, Yao, Yuan, Fanchao Qi, Guan, Jian, Pei Ke, Cai, Yanzheng, Zeng, Guoyang, Tan, Zhixing, Liu, Zhiyuan, Huang, Minlie, Han, Wentao, Liu, Yang, Zhu, Xiaoyan, Sun, Maosong
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of InfMoE when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at https://github.com/TsinghuaAI/CPM.
ISSN:2331-8422