Loading…

ANETAC: Arabic Named Entity Transliteration and Classification Dataset

In this paper, we make freely accessible ANETAC our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79,924 instances, each instance is a triplet (e, a, c), where e is the English named entit...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2019-07
Main Authors: Mohamed Seghir Hadj Ameur, Meziane, Farid, Guessoum, Ahmed
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we make freely accessible ANETAC our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79,924 instances, each instance is a triplet (e, a, c), where e is the English named entity, a is its Arabic transliteration and c is its class that can be either a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes.
ISSN:2331-8422