Loading…

Exploiting similarities across multiple dimensions for author name disambiguation

In bibliometric analysis, ambiguity in author names may lead to erroneous aggregation of records. The author name disambiguation techniques attempt to address this issue by attributing records to the corresponding author. The name disambiguation has been widely studied as a clustering task. However,...

Full description

Saved in:

Bibliographic Details
Published in:	Scientometrics 2021-09, Vol.126 (9), p.7525-7560
Main Authors:	Pooja, KM, Mondal, Samrat, Chandra, Joydeep
Format:	Article
Language:	English
Subjects:	Bibliometrics Clustering Computer Science Datasets Embedding Information Storage and Retrieval Library Science Representations Statistical tests
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In bibliometric analysis, ambiguity in author names may lead to erroneous aggregation of records. The author name disambiguation techniques attempt to address this issue by attributing records to the corresponding author. The name disambiguation has been widely studied as a clustering task. However, maintaining consistent accuracy levels over datasets is still a major challenge. Recent efforts have witnessed the use of representation learning based techniques to map the records to an embedding space that can be used to determine the clusters. However, some of these models that use supervised global embedding fail to generalize across different datasets, while others lag in the accuracy. In this paper, we propose a method that uses two independent relations among the documents- co-authorship and meta-content of document, to generate a latent representation of documents that is capable of generalizing over various datasets (consisting different sets of features). Through rigorous validation, we discover that the proposed approach outperforms several state-of-the-art methods by a significant margin in terms of standard measures like pairwise F1, K metric, and BF1 scores. Moreover, we have also validated the performance of our method with the statistical test.
ISSN:	0138-9130 1588-2861
DOI:	10.1007/s11192-021-04101-y