Loading…
SILT: Efficient transformer training for inter-lingual inference
The ability of transformers to perform precision tasks such as question answering, Natural Language Inference (NLI) or summarizing, has enabled them to be ranked as one of the best paradigms to address Natural Language Processing (NLP) tasks. NLI is one of the best scenarios to test these architectu...
Saved in:
Published in: | Expert systems with applications 2022-08, Vol.200, p.116923, Article 116923 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The ability of transformers to perform precision tasks such as question answering, Natural Language Inference (NLI) or summarizing, has enabled them to be ranked as one of the best paradigms to address Natural Language Processing (NLP) tasks. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established relationships between a hypothesis and a premise. Nevertheless, these models suffer from the incapacity to generalize to other domains or from difficulties to face multilingual and interlingual scenarios. The leading pathway in the literature to address these issues involve designing and training extremely large architectures, but this causes unpredictable behaviors and establishes barriers which impede broad access and fine tuning. In this paper, we propose a new architecture called Siamese Inter-Lingual Transformer (SILT). This architecture is able to efficiently align multilingual embeddings for Natural Language Inference, allowing for unmatched language pairs to be processed. SILT leverages siamese pre-trained multi-lingual transformers with frozen weights where the two input sentences attend to each other to later be combined through a matrix alignment method. The experimental results carried out in this paper evidence that SILT allows to reduce drastically the number of trainable parameters while allowing for inter-lingual NLI and achieving state-of-the-art performance on common benchmarks.
•Efficient alignment of multilingual embeddings for Natural Language Inference.•Siamese pretrained multilingual transformers with frozen weights and mutual attention.•A curated Spanish version of the SICK dataset, called SICK-es is provided.•Drastic reduction of trainable parameters and ability to perform inter-lingual tasks. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2022.116923 |