Loading…

Name Disambiguation Based on Graph Convolutional Network

Recently, massive online academic resources have provided convenience for scientific study and research. However, the author name ambiguity degrades the user experience in retrieving the literature bases. Extracting the features of papers and calculating the similarity for clustering constitute the...

Full description

Saved in:

Bibliographic Details
Published in:	Scientific programming 2021, Vol.2021, p.1-11
Main Authors:	Chen, Ya, Yuan, Hongliang, Liu, Tingting, Ding, Nan
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Artificial neural networks Cluster analysis Clustering Convolution Datasets Feature extraction Graphs Internet Keywords Machine learning Methods Names Technical services User experience
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recently, massive online academic resources have provided convenience for scientific study and research. However, the author name ambiguity degrades the user experience in retrieving the literature bases. Extracting the features of papers and calculating the similarity for clustering constitute the mainstream of present name disambiguation approaches, which can be divided into two branches: clustering based on attribute features and clustering based on linkage information. They cannot however get high performance. In order to improve the efficiency of literature retrieval and provide technical support for the accurate construction of literature bases, a name disambiguation method based on Graph Convolutional Network (GCN) is proposed. The disambiguation model based on GCN designed in this paper combines both attribute features and linkage information. We first build paper-to-paper graphs, coauthor graphs, and paper-to-author graphs for each reference item of a name. The nodes in the graphs contain attribute features and the edges contain linkage features. The graphs are then fed to a specialized GCN and output a hybrid representation. Finally, we use the hierarchical clustering algorithm to divide the papers into disjoint clusters. Finally, we cluster the papers using a hierarchical algorithm. The experimental results show that the proposed model achieves average F1 value of 77.10% on three name disambiguation datasets. In order to let the model automatically select the appropriate number of convolution layers and adapt to the structure of different local graphs, we improve upon the prior GCN model by utilizing attention mechanism. Compared with the original GCN model, it increases the average precision and F1 value by 2.05% and 0.63%, respectively. What is more, we build a bilingual dataset, BAT, which contains various forms of academic achievements and will be an alternative in future research of name disambiguation.
ISSN:	1058-9244 1875-919X
DOI:	10.1155/2021/5577692