Loading…

Sustainable Topic Modeling for Legal Moroccan Arabic Language: A Challenging Study on BERTopic Technique

Topic Modeling approaches face difficulties in processing legal texts because of their unique characteristics, such as the length of the texts and the specialized terminology used within them. The process of topic modeling involves finding a text's semantic structure. This way, specific approac...

Full description

Saved in:
Bibliographic Details
Published in:Procedia computer science 2024, Vol.236, p.582-588
Main Authors: Aouichaty, Soufiane, Maleh, Yassine, Mohtadi, Mohamed Taib, Hajami, Abdelmajid, Allali, Hakim
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Topic Modeling approaches face difficulties in processing legal texts because of their unique characteristics, such as the length of the texts and the specialized terminology used within them. The process of topic modeling involves finding a text's semantic structure. This way, specific approaches are needed. When the legal documents are presented has a lot to do with what topics are important. This paper aims to explain and evaluate BERTopic's application to topic modeling in legal documents. In this research, we experiment with BERTopic by utilizing its several pre-trained Arabic language models as embeddings. Performance evaluation employs the Normalized Pointwise Mutual Information (NPMI) measure. Notably, in comparison to multilingual pre-trained models, our findings reveal that BERTopic using Arabic monolingual pre-trained models exhibits superior performance, offering insights into sustainable and efficient topic modeling for legal documents.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2024.05.069