Loading…

A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation

Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored thoroughly in the low-resource Thai and Myanmar languages. Therefore, we first applied augmentation techniques such as Swit...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on Asian and low-resource language information processing 2024-04, Vol.23 (4), p.1-24, Article 54
Main Authors: San, Mya Ei, Usanavasin, Sasiporn, Thu, Ye Kyaw, Okumura, Manabu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored thoroughly in the low-resource Thai and Myanmar languages. Therefore, we first applied augmentation techniques such as SwitchOut and Ciphertext Based Data Augmentation (CipherDAug) to improve NMT performance in these languages. Second, we enhanced the NMT performance by fine-tuning the pre-trained Multilingual Denoising BART model (mBART), where BART denotes Bidirectional and Auto-Regressive Transformer. We implemented three NMT systems: namely, Transformer+SwitchOut, Multi-Source Transformer+CipherDAug, and fine-tuned mBART in the bidirectional translations of Thai-English-Myanmar language pairs from the ASEAN-MT corpus. Experimental results showed that Multi-Source Transformer+CipherDAug significantly improved Bilingual Evaluation Understudy (BLEU), Character n-gram F-score (ChrF), and Translation Error Rate (TER) scores over the first baseline Transformer and second baseline Edit-Based Transformer. The model achieved notable BLEU scores: 37.9 (English-to-Thai), 42.7 (Thai-to-English), 28.9 (English-to-Myanmar), 31.2 (Myanmar-to-English), 25.3 (Thai-to-Myanmar), and 25.5 (Myanmar-to-Thai). The fine-tuned mBART model also considerably outperformed the two baselines, except for the Myanmar-to-English pair. SwitchOut improved over the second baseline in all pairs and performed similarly to the first baseline in most cases. Last, we performed detailed analyses verifying that the CipherDAug and mBART models potentially facilitate improving low-resource NMT performance in Thai and Myanmar languages.
ISSN:2375-4699
2375-4702
DOI:10.1145/3645111