Loading…
Sustainable Topic Modeling for Legal Moroccan Arabic Language: A Challenging Study on BERTopic Technique
Topic Modeling approaches face difficulties in processing legal texts because of their unique characteristics, such as the length of the texts and the specialized terminology used within them. The process of topic modeling involves finding a text's semantic structure. This way, specific approac...
Saved in:
Published in: | Procedia computer science 2024, Vol.236, p.582-588 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Topic Modeling approaches face difficulties in processing legal texts because of their unique characteristics, such as the length of the texts and the specialized terminology used within them. The process of topic modeling involves finding a text's semantic structure. This way, specific approaches are needed. When the legal documents are presented has a lot to do with what topics are important. This paper aims to explain and evaluate BERTopic's application to topic modeling in legal documents. In this research, we experiment with BERTopic by utilizing its several pre-trained Arabic language models as embeddings. Performance evaluation employs the Normalized Pointwise Mutual Information (NPMI) measure. Notably, in comparison to multilingual pre-trained models, our findings reveal that BERTopic using Arabic monolingual pre-trained models exhibits superior performance, offering insights into sustainable and efficient topic modeling for legal documents. |
---|---|
ISSN: | 1877-0509 1877-0509 |
DOI: | 10.1016/j.procs.2024.05.069 |