Loading…

Arabic light-based stemmer using new rules

Superior stemming algorithms aid significantly in many natural language processing (NLP) applications such as information retrieval. Arabic light-based stemmer is one of the most important stemming algorithms. However, partially due to the highly inflected and complexity of Arabic language morpholog...

Full description

Saved in:
Bibliographic Details
Published in:Journal of King Saud University. Computer and information sciences 2022-10, Vol.34 (9), p.6635-6642
Main Authors: Alshalabi, Hamood, Tiun, Sabrina, Omar, Nazlia, AL-Aswadi, Fatima N., Ali Alezabi, Kamal
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Superior stemming algorithms aid significantly in many natural language processing (NLP) applications such as information retrieval. Arabic light-based stemmer is one of the most important stemming algorithms. However, partially due to the highly inflected and complexity of Arabic language morphological structure, most of the existing Arabic light-based stemmer algorithms eliminate a few numbers of suffixes and prefixes or both in the process of recognising the infix patterns to determine roots. The elimination of suffixes and prefixes leads to many inefficient results. Hence, this study aims to develop an improved light-based algorithm of the Arabic stemmer by proposing an appropriate suffixes and prefixes list, which is supported by rules according to word length (without using a morpheme or patterns on a stem). Our improved Dlight Arabic stemmer focuses on determining and removing the infix patterns under many rules on length-words and according to a specific order of the stages of the stemming to extract the double, triple and quadruple roots from long and short Arabic words. To evaluate our proposed light-based Arabic stemmer, we compared our stemmer against existing Arabic stemmers, namely Light10, Condlight and ARLST. The experimental results showed the proposed Develop Arabic Light-Based Stemmer (Dlight) obtained the best performance with 68% of F-measure, while the other three Arabic stemmers yield slightly lower F-measure. Finally, establishing an appropriate list of suffixes and prefixes with word length rules to stem Arabic words can improve the performance of a light-based Arabic stemmer.
ISSN:1319-1578
2213-1248
DOI:10.1016/j.jksuci.2021.08.017