Loading…

Fine-tuning techniques and data augmentation on transformer-based models for conversational texts and noisy user-generated content

Transfer learning and Transformer-based language models play important roles in modern natural language processing research community. In this paper, we propose Transformer model's fine-tuning and data augmentation (TMFTDA) techniques for conversational texts and noisy user-generated content. W...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jiang, Mike Tian-Jian, Wu, Shih-Hung, Chen, Yi-Kun, Gu, Zhao-Xian, Chiang, Cheng-Jhe, Wu, Yueh-Chia, Huang, Yu-Chen, Chiu, Cheng-Han, Shaw, Sheng-Ru, Day, Min-Yuh
Format:	Conference Proceeding
Language:	English
Subjects:	Analytical models conversational texts data augmentation Data models fine-tuning techniques Noise measurement noisy user-generated content Social networking (online) Task analysis Transfer learning Transformer-based models User-generated content
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Transfer learning and Transformer-based language models play important roles in modern natural language processing research community. In this paper, we propose Transformer model's fine-tuning and data augmentation (TMFTDA) techniques for conversational texts and noisy user-generated content. We use two NTCIR-15 tasks, namely the first Dialogue Evaluation (DialEval-1) task and the second Numeral Attachment in Financial Tweets (FinNum-2) task, to evaluate the efficacy of TMFTDA. Experimental results show that TMFTDA substantially outperforms the baselines model of Bidirectional Long Short-Term Memory (Bi-LSTM) in multi-turn dialogue system evaluation at DialEval-1's Dialogue Quality (DQ) and Nugget Detection (ND) subtasks. Moreover, TMFTDA performs to a satisfactory level at FinNum-2 with a model of Cross-lingual Language Models using a Robustly Optimized BERT Pretraining Approach (XLM-RoBERTa). The research contribution of this paper is that, we help shed some light on the usefulness of TMFTDA, for conversational texts and noisy user-generated content in social media text analytics.
ISSN:	2473-991X
DOI:	10.1109/ASONAM49781.2020.9381329