Loading…

Content based video retrieval system using two stream convolutional neural network

Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2023-07, Vol.82 (16), p.24465-24483
Main Authors: Sowmyayani, S., Rani, P. Arockia Jansi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given video query. The objective of the paper is retrieve most relevant videos for a given query video in reduced time. To meet the objective, this paper proposes an efficient video retrieval system using salient object detection and keyframe extraction methods to reduce the high dimensionality of video data. The spatio-temporal features are extracted using two-stream Convolutional Neural Network (CNN) and stored in a feature dataset. The salient objects are used to search the exact subject that is given as query. The relevant videos are identified through similarity matching of feature dataset that are created using the input dataset with the feature of query video. To reduce the complexity of similarity matching, the proposed method replaces feature dataset with classification score dataset. Experiments are conducted on TRECVID and CC_Web_Video datasets and evaluated using precision, recall, specificity, accuracy and f-score. The proposed method is compared with recent methods and proved its efficiency with approximately 99.68% precision rate on TRECVID dataset and 88.9% precision rate on CC_Web_Video dataset. The proposed outperforms most recent methods by 0.001 increase in mean Average Precision (mAP) on CC_Web_Video dataset and 4% increase in precision rate on TRECVID dataset. The computation time is reduced by 100 min on TRECVID and 200 min on CC_Web_Video datasets.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-023-14784-5