Loading…

Unified benchmark for zero-shot Turkish text classification

Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zero-shot text classification in Tur...

Full description

Saved in:

Bibliographic Details
Published in:	Information processing & management 2023-05, Vol.60 (3), p.103298, Article 103298
Main Authors:	Çelik, Emrecan, Dalyan, Tuğba
Format:	Article
Language:	English
Subjects:	Masked language modeling Natural language inference Next sentence prediction Text classification Zero-shot learning
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zero-shot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models. •A unified benchmark for zero-shot Turkish text classification task.•Monolingual models are better in ZS text classification task than multilingual models.•Masked language modeling with word embeddings can be used for ZS text classification.•MLM with word embeddings yields worse results compared to other methods.
ISSN:	0306-4573 1873-5371
DOI:	10.1016/j.ipm.2023.103298