Loading…
Cross-Convolutional-Layer Pooling for Image Recognition
Recent studies have shown that a Deep Convolutional Neural Network (DCNN) trained on a large image dataset can be used as a universal image descriptor and that doing so leads to impressive performance for a variety of image recognition tasks. Most of these studies adopt activations from a single DCN...
Saved in:
Published in: | IEEE transactions on pattern analysis and machine intelligence 2017-11, Vol.39 (11), p.2305-2313 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent studies have shown that a Deep Convolutional Neural Network (DCNN) trained on a large image dataset can be used as a universal image descriptor and that doing so leads to impressive performance for a variety of image recognition tasks. Most of these studies adopt activations from a single DCNN layer, usually a fully-connected layer, as the image representation. In this paper, we proposed a novel way to extract image representations from two consecutive convolutional layers: one layer is used for local feature extraction and the other serves as guidance to pool the extracted features. By taking different viewpoints of convolutional layers, we further develop two schemes to realize this idea. The first directly uses convolutional layers from a DCNN. The second applies the pretrained CNN on densely sampled image regions and treats the fully-connected activations of each image region as a convolutional layer's feature activations. We then train another convolutional layer on top of that as the pooling-guidance convolutional layer. By applying our method to three popular visual classification tasks, we find that our first scheme tends to perform better on applications which demand strong discrimination on lower-level visual patterns while the latter excels in cases that require discrimination on category-level patterns. Overall, the proposed method achieves superior performance over existing approaches for extracting image representations from a DCNN. In addition, we apply cross-layer pooling to the problem of image retrieval and propose schemes to reduce the computational cost. Experimental results suggest that the proposed method achieves promising results for the image retrieval task. |
---|---|
ISSN: | 0162-8828 1939-3539 2160-9292 |
DOI: | 10.1109/TPAMI.2016.2637921 |