Loading…

Multilingual Autoregressive Entity Linking

We present mGENRE, a sequence-to- sequence system for the Multilingual Entity Linking (MEL) problem—the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token i...

Full description

Saved in:
Bibliographic Details
Published in:Transactions of the Association for Computational Linguistics 2022-03, Vol.10, p.274-290
Main Authors: De Cao, Nicola, Wu, Ledell, Popat, Kashyap, Artetxe, Mikel, Goyal, Naman, Plekhanov, Mikhail, Zettlemoyer, Luke, Cancedda, Nicola, Riedel, Sebastian, Petroni, Fabio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present mGENRE, a sequence-to- sequence system for the Multilingual Entity Linking (MEL) problem—the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode mention string and entity names to capture more interactions than the standard dot product between mention and entity vectors. It also enables fast search within a large KB even for mentions that do not appear in mention tables and with no need for large-scale vector indices. While prior MEL works use a single representation for each entity, we match against entity names of as many languages as possible, which allows exploiting language connections between source input and target name. Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time. This leads to over 50% improvements in average accuracy. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks where we establish new state-of-the-art results. Source code available at .
ISSN:2307-387X
2307-387X
DOI:10.1162/tacl_a_00460