Loading…

WordNet2Vec: Corpora agnostic word vectorization method

The complex nature of big data resources requires new structuring methods, especially for textual content. WordNet is a good knowledge source for the comprehensive abstraction of natural language as it offers good implementation for many languages. Since WordNet embeds natural language in the form o...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2019-01, Vol.326-327, p.141-150
Main Authors: Bartusiak, Roman, Augustyniak, Łukasz, Kajdanowicz, Tomasz, Kazienko, Przemysław, Piasecki, Maciej
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The complex nature of big data resources requires new structuring methods, especially for textual content. WordNet is a good knowledge source for the comprehensive abstraction of natural language as it offers good implementation for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism, WordNet2Vec, is proposed in this paper. This creates vectors for each word from WordNet. These vectors encapsulate a general position — the role of a given word related to all other words in the given natural language. Any list or set of such vectors contains knowledge about the context of its components within the whole language. This type of word representation can be easily applied to many analytic tasks such as classification or clustering. The usefulness of the WordNet2Vec method is demonstrated in sentiment analysis including the classification of an Amazon opinion text dataset with transfer learning.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2017.01.121