Loading…

Foreground segmentation using convolutional neural networks for multiscale feature encoding

•Two different encoder-decoder type deep convolutional neural network models are proposed for multi-scale feature encoding.•Multiscale feature embedding is modeled in FgSegNet−M using a triplet convolutional network.•Multiscale feature embedding is modeled in FgSegNet−S using a feature pooling modul...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters 2018-09, Vol.112, p.256-262
Main Authors: Lim, Long Ang, Yalim Keles, Hacer
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Two different encoder-decoder type deep convolutional neural network models are proposed for multi-scale feature encoding.•Multiscale feature embedding is modeled in FgSegNet−M using a triplet convolutional network.•Multiscale feature embedding is modeled in FgSegNet−S using a feature pooling module (FPM).•Proposed models are robust against various difficult situations and can be trained using only a few frames, i.e. 50–200.•FgSegNet−S is ranked as number one and FgSegNet−M as number two in Change Detection 2014 Challenge. Several methods have been proposed to solve moving objects segmentation problem accurately in different scenes. However, many of them lack the ability of handling various difficult scenarios such as illumination changes, background or camera motion, camouflage effect, shadow etc. To address these issues, we propose two robust encoder-decoder type neural networks that generate multi-scale feature encodings in different ways and can be trained end-to-end using only a few training samples. Using the same encoder-decoder configurations, in the first model, a triplet of encoders take the inputs in three scales to embed an image in a multi-scale feature space; in the second model, a Feature Pooling Module (FPM) is plugged on top of a single input encoder to extract multi-scale features in the middle layers. Both models use a transposed convolutional network in the decoder part to learn a mapping from feature space to image space. In order to evaluate our models, we entered the Change Detection 2014 Challenge (changedetection.net) and our models, namely FgSegNet_M and FgSegNet_S, outperformed all the existing state-of-the-art methods by an average F-Measure of 0.9770 and 0.9804, respectively. We also evaluate our models on SBI2015 and UCSD Background Subtraction datasets. Our source code is made publicly available at https://github.com/lim-anggun/FgSegNet.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2018.08.002