Loading…

Learning Rich Part Hierarchies With Progressive Attention Networks for Fine-Grained Image Recognition

We investigate the localization of subtle yet discriminative parts for fine-grained image recognition. Based on the observation that such parts typically exist within a hierarchical structure (e.g., from a coarse-scale "head" to a fine-scale "eye" when recognizing bird species),...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on image processing 2020-01, Vol.29, p.476-488
Main Authors:	Zheng, Heliang, Fu, Jianlong, Zha, Zheng-Jun, Luo, Jiebo, Mei, Tao
Format:	Article
Language:	English
Subjects:	Annotations Artificial neural networks Beak Birds Convolutional neural networks Feature extraction Fine-grained recognition Head Hierarchies Image recognition Object recognition part hierarchies progressive attention Proposals Structural hierarchy
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We investigate the localization of subtle yet discriminative parts for fine-grained image recognition. Based on the observation that such parts typically exist within a hierarchical structure (e.g., from a coarse-scale "head" to a fine-scale "eye" when recognizing bird species), we propose a novel progressive-attention convolutional neural network (PA-CNN) to progressively localize parts at multiple scales. The PA-CNN localizes parts in two steps, where a part proposal network (PPN) generates multiple local attention maps, and a part rectification network (PRN) learns part-specific features from each proposal and provides the PPN with refined part locations. This coupling of the PPN and PRN allows them to be optimized in a mutually reinforcing manner, leading to improved pinpointing of fine-grained parts. Moreover, the convolutional parameters for a PPN at a finer scale can be inherited from the PRN at a coarser scale, enabling a rich part hierarchy (e.g., eye and beak in a bird's head) to be learned in a stacked fashion. Case studies show that PA-CNN can precisely identify parts without using bounding box/part annotations. In addition, quantitative evaluations demonstrate that PA-CNN yields state-of-the-art performance in three challenging fine-grained recognition tasks. i.e., CUB-2000-2011, FGVC-Aircraft, and Stanford Cars.
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2019.2921876