Loading…

Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

Visual scene recognition is an indispensable part of automatic localization and navigation. In the same scene, the appearance and viewpoint may be changed greatly, which is the largest challenge for some advanced unmanned systems,e.g. robot,vehicle and UAV,etc., to identify scenes where they have vi...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent & robotic systems 2021-07, Vol.102 (3), Article 58
Main Authors: Li, Zhenyu, Zhou, Aiguo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visual scene recognition is an indispensable part of automatic localization and navigation. In the same scene, the appearance and viewpoint may be changed greatly, which is the largest challenge for some advanced unmanned systems,e.g. robot,vehicle and UAV,etc., to identify scenes where they have visited. Traditional methods have been subjected to hand-made feature-based paradigms for a long time, mainly relying on the prior knowledge of the designer, and are not sufficiently robust to extreme changing scenes. In this paper, we cope with scene recognition with automatically learning the representation of features from big image samples. Firstly, we propose a novel approach for scene recognition via training a slight-weight convolutional neural network (CNN) that overall has less complex and more efficient network architecture, and is trainable in the manner of end-to-end. The proposed approach uses the deep-leaning features of self-selection combining with light CNN process to perform high semantic understanding of visual scenes. Secondly, we propose to employ a salient region-based technology to extract the local feature representation of a specific scene region directly from the convolution layer based on self-selection mechanism, and each layer performs a linear operation with end-to-end manner. Furthermore, we also utilize probability statistics to calculate the total similarity of several regions in one scene to other regions, and finally rank the similarity scores to select the correct scene. We have conducted a lot of experiments to evaluate the results of performance by comparing four methods (namely, our proposed and other three well known and advanced methods). Experimental results show that the proposed method is more robust and accurate than other three well-known methods in extremely harsh environments (e. g. weak light and strong blur).
ISSN:0921-0296
1573-0409
DOI:10.1007/s10846-021-01421-2