Loading…

GBP: Graph convolutional network embedded in bilinear pooling for fine-grained encoding

In fine-grained recognition, classical high-order coding has inherent contradiction between visual burstiness and feature redundancy, the core of which is the inherent instability of high-order features. Existing methods mainly use EIG and SVD decomposition to maintain feature stability, but this pr...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & electrical engineering 2024-05, Vol.116, p.109158, Article 109158
Main Authors:	Du, Yinan, Tang, Jian, Rui, Ting, Li, Xinxin, Yang, Chengsong
Format:	Article
Language:	English
Subjects:	Bilinear pooling Computer vision Fine-grained recognition Graph Neural Networks
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In fine-grained recognition, classical high-order coding has inherent contradiction between visual burstiness and feature redundancy, the core of which is the inherent instability of high-order features. Existing methods mainly use EIG and SVD decomposition to maintain feature stability, but this process increases feature redundancy. To address this problem, this paper proposes a Graph Bilinear Pooling (GBP) model to obtain stable fine-grained features through the effective aggregation ability of graph networks. GBP avoids explicit feature decomposition and reconciles the contradiction between visual burstiness and feature redundancy. First, GBP transforms images into a graph spectrum through feature correlation measurement. Then, an improved multi-head graph convolution structure is proposed by using Graph Isomorphism Networks (GIN) to realize feature aggregation. Finally, bilinear pooling operations are performed between graph convolution feature maps and original feature maps to obtain more compact and stable fine-grained feature representations. Experiments on CUB, Cars, and Aircrafts datasets demonstrate that the accuracy of the proposed method is 87.8 %, 93.5 %, and 89.6 % respectively, with a feature representation of 2048 dimensions. Compared to the baseline, the feature number is only 25 % of the baseline model, and the accuracy is increased by 2.6 %, 1.7 %, and 1.3 % respectively. These results demonstrate the effectiveness of graph neural network embedding in improving feature stability.
ISSN:	0045-7906 1879-0755
DOI:	10.1016/j.compeleceng.2024.109158