Loading…

Fine-tuning techniques and data augmentation on transformer-based models for conversational texts and noisy user-generated content

Transfer learning and Transformer-based language models play important roles in modern natural language processing research community. In this paper, we propose Transformer model's fine-tuning and data augmentation (TMFTDA) techniques for conversational texts and noisy user-generated content. W...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiang, Mike Tian-Jian, Wu, Shih-Hung, Chen, Yi-Kun, Gu, Zhao-Xian, Chiang, Cheng-Jhe, Wu, Yueh-Chia, Huang, Yu-Chen, Chiu, Cheng-Han, Shaw, Sheng-Ru, Day, Min-Yuh
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Transfer learning and Transformer-based language models play important roles in modern natural language processing research community. In this paper, we propose Transformer model's fine-tuning and data augmentation (TMFTDA) techniques for conversational texts and noisy user-generated content. We use two NTCIR-15 tasks, namely the first Dialogue Evaluation (DialEval-1) task and the second Numeral Attachment in Financial Tweets (FinNum-2) task, to evaluate the efficacy of TMFTDA. Experimental results show that TMFTDA substantially outperforms the baselines model of Bidirectional Long Short-Term Memory (Bi-LSTM) in multi-turn dialogue system evaluation at DialEval-1's Dialogue Quality (DQ) and Nugget Detection (ND) subtasks. Moreover, TMFTDA performs to a satisfactory level at FinNum-2 with a model of Cross-lingual Language Models using a Robustly Optimized BERT Pretraining Approach (XLM-RoBERTa). The research contribution of this paper is that, we help shed some light on the usefulness of TMFTDA, for conversational texts and noisy user-generated content in social media text analytics.
ISSN:2473-991X
DOI:10.1109/ASONAM49781.2020.9381329