Loading…

Tone prediction and orthographic conversion for Basaa

In this paper, we present a seq2seq approach for transliterating missionary Basaa orthographies into the official orthography. Our model uses pre-trained Basaa missionary and official orthography corpora using BERT. Since Basaa is a low-resource language, we have decided to use the mT5 model for our...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2022-10
Main Authors: Nikitin, Ilya, O'Connor, Brian, Safonova, Anastasia
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we present a seq2seq approach for transliterating missionary Basaa orthographies into the official orthography. Our model uses pre-trained Basaa missionary and official orthography corpora using BERT. Since Basaa is a low-resource language, we have decided to use the mT5 model for our project. Before training our model, we pre-processed our corpora by eliminating one-to-one correspondences between spellings and unifying characters variably containing either one to two characters into single-character form. Our best mT5 model achieved a CER equal to 12.6747 and a WER equal to 40.1012.
ISSN:2331-8422