Loading…
ARABIC SOFT SPELLING CORRECTION WITH T5
Spelling correction is considered a challenging task for resource-scarce languages. The Arabic language is one of these resource-scarce languages, which suffers from the absence of a large spelling correction dataset, thus datasets injected with artificial errors are used to overcome this problem. I...
Saved in:
Published in: | Jordanian journal of computers and information technology (Online) 2024-03, Vol.10 (1), p.46-57 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Spelling correction is considered a challenging task for resource-scarce languages. The Arabic language is one of these resource-scarce languages, which suffers from the absence of a large spelling correction dataset, thus datasets injected with artificial errors are used to overcome this problem. In this paper, we trained the Text-to-Text Transfer Transformer (T5) model using artificial errors to correct Arabic soft spelling mistakes. Our T5 model can correct 97.8% of the artificial errors that were injected into the test set. Additionally, our T5 model achieves a character error rate (CER) of 0.77% on a set that contains real soft spelling mistakes. We achieved these results using a 4-layer T5 model trained with a 90% error injection rate, with a maximum sequence length of 300 characters. |
---|---|
ISSN: | 2413-9351 2415-1076 |
DOI: | 10.5455/jjcit.71-1699768515 |