Loading…

Closed form word embedding alignment

We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge and information systems 2021-03, Vol.63 (3), p.565-588
Main Authors:	Dev, Sunipa, Hassan, Safia, Phillips, Jeff M.
Format:	Article
Language:	English
Subjects:	Closed form solutions Computer Science Data Mining and Knowledge Discovery Database Management Exact solutions Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Mathematical analysis Optimization Regular Paper Similarity
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same dimensional space. Our methods extend approaches known as absolute orientation, which are popular for aligning objects in three dimensions, and generalize an approach by Smith et al. (ICLR 2017). We prove new results for optimal scaling and for maximizing cosine similarity. Then, we demonstrate how to evaluate the similarity of embeddings from different sources or mechanisms, and that certain properties like synonyms and analogies are preserved across the embeddings and can be enhanced by simply aligning and averaging ensembles of embeddings.
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-020-01531-7