Loading…

Self-supervised Regularization for Text Classification

Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization ap...

Full description

Saved in:

Bibliographic Details
Published in:	Transactions of the Association for Computational Linguistics 2021-01, Vol.9, p.641-656
Main Authors:	Zhou, Meng, Li, Zechen, Xie, Pengtao
Format:	Article
Language:	English
Subjects:	Classification Labels Machine learning Regularization Self-supervised learning Text categorization Texts Unsupervised learning
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., ) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at .
ISSN:	2307-387X 2307-387X
DOI:	10.1162/tacl_a_00389