Loading…

A Framework for Indonesian Grammar Error Correction

Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC fram...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on Asian and low-resource language information processing 2021-07, Vol.20 (4), p.1-12
Main Authors: Lin, Nankai, Chen, Boyu, Lin, Xiaotian, Wattanachote, Kanoksak, Jiang, Shengyi
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC framework that has the potential to be a baseline method for Indonesian GEC tasks. This framework treats GEC as a multi-classification task. It integrates different language embedding models and deep learning models to correct 10 types of Part of Speech (POS) error in Indonesian text. In addition, we constructed an Indonesian corpus that can be utilized as an evaluation dataset for Indonesian GEC research. Our framework was evaluated on this dataset. Results showed that the Long Short-Term Memory model based on word-embedding achieved the best performance. Its overall macro-average F 0.5 in correcting 10 POS error types reached 0.551. Results also showed that the framework can be trained on a low-resource dataset.
ISSN:2375-4699
2375-4702
DOI:10.1145/3440993