Loading…

Deterministic sampling in heterogeneous graph neural networks

•Representation learning of heterogeneous data is modelled via graphs.•A new deterministic sampling method for heterogeneous graphs.•Effective in choosing significant neighbouring nodes for embedding.•Superior performance in link prediction and node recommendation. Graphs are typically used to model...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters 2023-08, Vol.172, p.74-81
Main Authors: Ansarizadeh, Fatemeh, Tay, David B., Thiruvady, Dhananjay, Robles-kelly, Antonio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Representation learning of heterogeneous data is modelled via graphs.•A new deterministic sampling method for heterogeneous graphs.•Effective in choosing significant neighbouring nodes for embedding.•Superior performance in link prediction and node recommendation. Graphs are typically used to model datasets where any given data point is correlated with only a small number of other data points in the set, i.e. localized correlations. In some datasets, the data points can be of different types, and this requires the use of heterogeneous graphs. Learning methods underpinned by graphs are used for analysis tasks such as node classification and link prediction. To exploit localized correlations in the learning process, sampling the neigbourhood of a candidate root node is typically required. The data from the sampled set of nodes can then be embedded and aggregated for use in an end-to-end neural network architecture. Previous approaches to sampling are stochastic in nature, e.g. random walk with restart. In this work, we propose a new approach to sampling that is deterministic in nature. The deterministic approach is based on the notion of node importance in relation to a root node. The factors that contribute to the importance are: (i) distance (number of edges) from root node; and (ii) centrality measure of the node. In this study, we adopt the Katz measure as the centrality measure. By devising an efficient sampling method together with node embedding and aggregation methods, we propose a Deterministic Heterogeneous Graph Neural Network (D-HetGNN). The application of D-HetGNN to three datasets is presented, and an extensive experimental evaluation demonstrates the superiority of the proposed sampling.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2023.05.022