Loading…

A New Method for Vietnamese Text Correction using Sequence Tagging Models

In this paper, we present a new approach for Vietnamese text error correction. A corrector consists of a Transformer to encode the input sequence, and sequence tag-gers to perform both error detection and correction. For the taggers, we apply special tokens to process insertions, deletions, and subs...

Full description

Saved in:
Bibliographic Details
Main Authors: Thi, Thoa Bui, Hoang, Hoa Luong Nguyen, Thi, Hien Nguyen, Viet, Anh Phan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we present a new approach for Vietnamese text error correction. A corrector consists of a Transformer to encode the input sequence, and sequence tag-gers to perform both error detection and correction. For the taggers, we apply special tokens to process insertions, deletions, and substitutions. The correction is performed in many steps repeatedly until the stopping criteria are met. At each step, we just correct the source sentence with minimal spans of tokens. These solutions make two advantages including 1) detecting and correcting various error types of Vietnamese texts, and 2) not generating uncontrollable outputs as generative models. As a result, our approach has yielded remarkable performance. On realistic dataset, our proposal model archives 79.5 % errors detected and 62.7% errors corrected; the highest SacreBLEU score is 86.10, that is a promising result.
ISSN:2694-4804
DOI:10.1109/KSE59128.2023.10299446