Loading…

A dual-stream encoder–decoder network with attention mechanism for saliency detection in video(s)

Salient Object Detection (SOD) is a crucial task within the domain of digital image processing which aims to detect objects in images or videos that attract special human attention. These visually attentive objects are referred as salient objects in computer vision and image processing. The automati...

Full description

Saved in:
Bibliographic Details
Published in:Signal, image and video processing image and video processing, 2024-04, Vol.18 (3), p.2037-2046
Main Authors: Kumain, Sandeep Chand, Singh, Maheep, Awasthi, Lalit Kumar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Salient Object Detection (SOD) is a crucial task within the domain of digital image processing which aims to detect objects in images or videos that attract special human attention. These visually attentive objects are referred as salient objects in computer vision and image processing. The automatic recognition of these attention-grabbing objects holds considerable importance for various applications such as video summarization, automated cropping for compression purposes, image and video captioning, and action recognition. In the last two decades, various methods have been proposed by the research community to mimic the human visual capability to find the object(s) that receives the most attention. Early methodologies primarily relied on conventional approaches, but more recently, deep learning-based techniques have gained significant interest and popularity in the domain of salient object detection in images and videos. In this work, the authors introduce an innovative model that employs a dual-stream encoder–decoder architecture for accurate saliency estimation in videos. Integrating an attention mechanism and non-local blocks makes the network more robust, leading to improved identification of salient objects. To assess the proposed model’s effectiveness, comprehensive evaluations have been conducted on well-known publicly available datasets such as VOS, DAVSOD, and ViSAL. The experimental results demonstrate that the proposed model achieves competitive performance when compared to state-of-the-art methods on S-Measure, F-Measure, and MAE performance evaluation metrics.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-023-02833-3