Loading…

Fast Video Saliency Detection via Maximally Stable Region Motion and Object Repeatability

Motion information is one important cue in unsupervised video salient object detection. In order to estimate motion in videos, most of the methods adopt time-consuming algorithms such as large displacement optical flow estimation (needs more than 8-40s with 640X480 size per frame), which leads to sa...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia 2022, Vol.24, p.4458-4470
Main Authors:	Huang, Xiaoming, Zhang, Yu-Jin
Format:	Article
Language:	English
Subjects:	Algorithms Computational modeling Estimation Feature extraction Hardware Human motion Image color analysis Machine learning maximally stable region motion Object detection Object recognition object repeatability Optical flow (image analysis) Pixels Reproducibility Salience Saliency detection salient object detection
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Motion information is one important cue in unsupervised video salient object detection. In order to estimate motion in videos, most of the methods adopt time-consuming algorithms such as large displacement optical flow estimation (needs more than 8-40s with 640X480 size per frame), which leads to saliency detection with only 0.01-0.1 FPS speed and limits its application. In human visual system, the motion of one object is usually considered as a whole. Therefore, we need not compute the motion of each pixel. Instead, it is desirable to estimate the probability of each pixel belonging to a well identifiable object, which is proposed as maximally stable region ( MSR ) in recent work, and compute object-level motion. Motivated by this intuition, we firstly propose one fast object-level video motion model based on MSR , which only needs 49 ms for 640X480 size frame. Next, we present spatial-temporal boundary connectivity ( BndCon ) and spatial-temporal Minimum Barrier Distance ( MBD ) to estimate background probability and saliency. Then, we propose the repeatability saliency which means the frequency of the object recurs in all video sequences. Besides, we propose one simple yet effective method to combine our unsupervised method and deep learning model to further boost performance. Compared with the state-of-the-art unsupervised methods, our method shows significantly better performance with 12 FPS speed on normal CPU hardware.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2021.3094356