Loading…

Digital Inclusion and Culture: Training LLaMA-2 to Empower Kichwa Communities

Language serves as a fundamental thread in humanity's social tapestry, intertwining individuality with collective identity. More than a mere instrument, it had been culture, tradition, and most significantly, the very definition of oneself. Historical narratives, often penned by a 'privile...

Full description

Saved in:
Bibliographic Details
Main Authors: Leon, James, Riofrio, Daniel, Grijalva, Felipe, Tambaco, Kuymi
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Language serves as a fundamental thread in humanity's social tapestry, intertwining individuality with collective identity. More than a mere instrument, it had been culture, tradition, and most significantly, the very definition of oneself. Historical narratives, often penned by a 'privileged' minority, marginalize linguistic 'others,' echoing a dissonance that threatens lesser-spoken languages.In this era of technological renaissance, Natural Language Processing (NLP) allows meanings to be drawn from human language through machine computation, more specifically by means of innovative neural network architectures such as those that use Transformers. However, digital scarcity of resources for those languages becomes a major obstacle to their inclusion in the global digital dialogues. We present URKU, a model meticulously fine-tuned for Kichwa, assessed through a rigorously designed test set informed by linguistic experts' profound insights. URKU uses Low Rank Adaptation (LoRA) techniques applied over Meta's open-source large language model, LLaMA-2, and specializes on the Ecuadorian Kichwa. We contrast URKU's potential abilities against OpenAI's availability and diversity-promoting custom-GPT feature, pointing at the capacity and possibility of LoRA to foster linguistic diversity.This study demonstrates the feasible integration of minority languages into cutting-edge AI, showcasing a technical leap towards digital inclusivity. We added emphasis on the importance of language in supporting active participation and the development of new knowledge bases that are rather critical for the democratization of information and empowerment of these communities. Our efforts advocate for a digital future that honors every voice, ensuring equitable representation beyond dominant narratives.
ISSN:2573-1998
DOI:10.1109/ICEDEG61611.2024.10702097