Loading…

BIM: Improving Graph Neural Networks with Balanced Influence Maximization

The imbalanced data classification problem has aroused lots of concerns from both academia and industry since data imbalance is a widespread phenomenon in many real-world scenarios. Although this problem has been well researched from the view of imbalanced class samples, we further argue that graph...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhang, Wentao, Gao, Xinyi, Yang, Ling, Cao, Meng, Huang, Ping, Shan, Jiulong, Yin, Hongzhi, Cui, Bin
Format:	Conference Proceeding
Language:	English
Subjects:	Data engineering Graph neural networks Imbalanced data classification Industries influence imbalance influence maximization Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The imbalanced data classification problem has aroused lots of concerns from both academia and industry since data imbalance is a widespread phenomenon in many real-world scenarios. Although this problem has been well researched from the view of imbalanced class samples, we further argue that graph neural networks (GNNs) expose a unique source of imbalance from the influenced nodes of different classes of labeled nodes, i.e., labeled nodes are imbalanced in terms of the number of nodes they influenced during the influence propagation in GNNs. To tackle this previously unexplored influence-imbalance issue, we connect social influence maximization with the imbalanced node classification problem and propose balanced influence maximization (BIM). Specifically, BIM greedily assigns the pseudo label to the node which can maximize the number of influenced nodes in GNN training while making the influence of each class more balance. Experimental results on five public datasets demonstrate the effectiveness of our method in relieving the influence-imbalance issue. For example, when training a GCN with an imbalance ratio of 0.1, BIM significantly outperforms the most competitive baseline by 0.6% -9.8% in five public datasets in terms of the F1 score.
ISSN:	2375-026X
DOI:	10.1109/ICDE60146.2024.00228