Loading…

Multi-scale pyramid pooling for deep convolutional representation

Compared to image representation based on low-level local descriptors, deep neural activations of Convolutional Neural Networks (CNNs) are richer in mid-level representation, but poorer in geometric invariance properties. In this paper, we present a straightforward framework for better image represe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yoo, Donggeun, Park, Sunggyun, Lee, Joon-Young, So Kweon, In
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy Activation Aggregates Image recognition Image representation Indoor Kernel Neural networks Object recognition Proposals Representations State of the art Support vector machines Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Compared to image representation based on low-level local descriptors, deep neural activations of Convolutional Neural Networks (CNNs) are richer in mid-level representation, but poorer in geometric invariance properties. In this paper, we present a straightforward framework for better image representation by combining the two approaches. To take advantages of both representations, we extract a fair amount of multi-scale dense local activations from a pre-trained CNN. We then aggregate the activations by Fisher kernel framework, which has been modified with a simple scale-wise normalization essential to make it suitable for CNN activations. Our representation demonstrates new state-of-the-art performances on three public datasets: 80.78% (Acc.) on MIT Indoor 67, 83.20% (mAP) on PASCAL VOC 2007 and 91.28% (Acc.) on Oxford 102 Flowers. The results suggest that our proposal can be used as a primary image representation for better performances in wide visual recognition tasks.
ISSN:	2160-7508 2160-7516
DOI:	10.1109/CVPRW.2015.7301274