Loading…
A New Method for Vietnamese Text Correction using Sequence Tagging Models
In this paper, we present a new approach for Vietnamese text error correction. A corrector consists of a Transformer to encode the input sequence, and sequence tag-gers to perform both error detection and correction. For the taggers, we apply special tokens to process insertions, deletions, and subs...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we present a new approach for Vietnamese text error correction. A corrector consists of a Transformer to encode the input sequence, and sequence tag-gers to perform both error detection and correction. For the taggers, we apply special tokens to process insertions, deletions, and substitutions. The correction is performed in many steps repeatedly until the stopping criteria are met. At each step, we just correct the source sentence with minimal spans of tokens. These solutions make two advantages including 1) detecting and correcting various error types of Vietnamese texts, and 2) not generating uncontrollable outputs as generative models. As a result, our approach has yielded remarkable performance. On realistic dataset, our proposal model archives 79.5 % errors detected and 62.7% errors corrected; the highest SacreBLEU score is 86.10, that is a promising result. |
---|---|
ISSN: | 2694-4804 |
DOI: | 10.1109/KSE59128.2023.10299446 |