Loading…

CatDetect, a framework for detecting Catalan tweets

This work deals with language detection. It includes new proposals ranging from lexicon and morphological analysis to an increasing use of machine learning solutions. In this case, the language study is focused on Catalan, a minority language. In the context of the Twitter social network, this incre...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2021-03, Vol.80 (7), p.10657-10677
Main Authors: Plaza, Sergi, Vilaplana, Jordi, Mateo, Jordi, Rius, Josep, Solsona, Francesc
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This work deals with language detection. It includes new proposals ranging from lexicon and morphological analysis to an increasing use of machine learning solutions. In this case, the language study is focused on Catalan, a minority language. In the context of the Twitter social network, this increases difficulty in detecting tweets (messages written on the Twitter social network). To achieve that, a Catalan-Twitter corpus was generated using lexical and morphological approaches, which then will be used to create supervised models based on machine learning techniques. They were also evaluated in order to see which obtains the best prediction score and thus, is the most suitable to be used. We demonstrate how our proposal is successful with Twitter in the case of minority languages. The best model is to be used on a website, where users can test the algorithm interactively in the front-end webpage and in background by means of a webservice across a RESTful API.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-020-10182-3