Loading…

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the c...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2022-03
Main Authors: Aji, Alham Fikri, Genta Indra Winata, Koto, Fajri, Cahyawijaya, Samuel, Romadhony, Ade, Rahmad Mahendra, Kurniawan, Kemal, Moeljadi, David, Prasojo, Radityo Eko, Baldwin, Timothy, Jey Han Lau, Ruder, Sebastian
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the current state of NLP research for Indonesia's 700+ languages. We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems. Finally, we provide general recommendations to help develop NLP technology not only for languages of Indonesia but also other underrepresented languages.
ISSN:2331-8422