Loading…

Word sense induction with agglomerative clustering and mutual information maximization

Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word's senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate...

Full description

Saved in:

Bibliographic Details
Published in:	AI open 2023, Vol.4, p.193-201
Main Authors:	Abdine, Hadi, Kamal Eddine, Moussa, Buscaldi, Davide, Vazirgiannis, Michalis
Format:	Article
Language:	English
Subjects:	BERT Computation and Language Computer Science Mutual information Natural language processing Transformer Unsupervised machine learning Word sense induction
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word's senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate word senses. In contrast, others employ off-the-shelf pre-trained language models with additional strategies to induce senses. This paper proposes a novel unsupervised method based on hierarchical clustering and invariant information clustering (IIC). The IIC loss is used to train a small model to optimize the mutual information between two vector representations of a target word occurring in a pair of synthetic paraphrases. This model is later used in inference mode to extract a higher-quality vector representation to be used in the hierarchical clustering. We evaluate our method on two WSI tasks and in two distinct clustering configurations (fixed and dynamic number of clusters). We empirically show that our approach is at least on par with the state-of-the-art baselines, outperforming them in several configurations. The code and data to reproduce this work are available to the public 1 .
ISSN:	2666-6510 2666-6510
DOI:	10.1016/j.aiopen.2023.12.001