Loading…
Closed form word embedding alignment
We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or...
Saved in:
Published in: | Knowledge and information systems 2021-03, Vol.63 (3), p.565-588 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same dimensional space. Our methods extend approaches known as absolute orientation, which are popular for aligning objects in three dimensions, and generalize an approach by Smith et al. (ICLR 2017). We prove new results for optimal scaling and for maximizing cosine similarity. Then, we demonstrate how to evaluate the similarity of embeddings from different sources or mechanisms, and that certain properties like synonyms and analogies are preserved across the embeddings and can be enhanced by simply aligning and averaging ensembles of embeddings. |
---|---|
ISSN: | 0219-1377 0219-3116 |
DOI: | 10.1007/s10115-020-01531-7 |