Loading…
Unknown Script: Impact of Script on Cross-Lingual Transfer
Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language...
Saved in:
Published in: | arXiv.org 2024-05 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size. |
---|---|
ISSN: | 2331-8422 |