Loading…
A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation
Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored thoroughly in the low-resource Thai and Myanmar languages. Therefore, we first applied augmentation techniques such as Swit...
Saved in:
Published in: | ACM transactions on Asian and low-resource language information processing 2024-04, Vol.23 (4), p.1-24, Article 54 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored thoroughly in the low-resource Thai and Myanmar languages. Therefore, we first applied augmentation techniques such as SwitchOut and Ciphertext Based Data Augmentation (CipherDAug) to improve NMT performance in these languages. Second, we enhanced the NMT performance by fine-tuning the pre-trained Multilingual Denoising BART model (mBART), where BART denotes Bidirectional and Auto-Regressive Transformer. We implemented three NMT systems: namely, Transformer+SwitchOut, Multi-Source Transformer+CipherDAug, and fine-tuned mBART in the bidirectional translations of Thai-English-Myanmar language pairs from the ASEAN-MT corpus. Experimental results showed that Multi-Source Transformer+CipherDAug significantly improved Bilingual Evaluation Understudy (BLEU), Character n-gram F-score (ChrF), and Translation Error Rate (TER) scores over the first baseline Transformer and second baseline Edit-Based Transformer. The model achieved notable BLEU scores: 37.9 (English-to-Thai), 42.7 (Thai-to-English), 28.9 (English-to-Myanmar), 31.2 (Myanmar-to-English), 25.3 (Thai-to-Myanmar), and 25.5 (Myanmar-to-Thai). The fine-tuned mBART model also considerably outperformed the two baselines, except for the Myanmar-to-English pair. SwitchOut improved over the second baseline in all pairs and performed similarly to the first baseline in most cases. Last, we performed detailed analyses verifying that the CipherDAug and mBART models potentially facilitate improving low-resource NMT performance in Thai and Myanmar languages. |
---|---|
ISSN: | 2375-4699 2375-4702 |
DOI: | 10.1145/3645111 |