Loading…
Content based video retrieval system using two stream convolutional neural network
Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given...
Saved in:
Published in: | Multimedia tools and applications 2023-07, Vol.82 (16), p.24465-24483 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given video query. The objective of the paper is retrieve most relevant videos for a given query video in reduced time. To meet the objective, this paper proposes an efficient video retrieval system using salient object detection and keyframe extraction methods to reduce the high dimensionality of video data. The spatio-temporal features are extracted using two-stream Convolutional Neural Network (CNN) and stored in a feature dataset. The salient objects are used to search the exact subject that is given as query. The relevant videos are identified through similarity matching of feature dataset that are created using the input dataset with the feature of query video. To reduce the complexity of similarity matching, the proposed method replaces feature dataset with classification score dataset. Experiments are conducted on TRECVID and CC_Web_Video datasets and evaluated using precision, recall, specificity, accuracy and f-score. The proposed method is compared with recent methods and proved its efficiency with approximately 99.68% precision rate on TRECVID dataset and 88.9% precision rate on CC_Web_Video dataset. The proposed outperforms most recent methods by 0.001 increase in mean Average Precision (mAP) on CC_Web_Video dataset and 4% increase in precision rate on TRECVID dataset. The computation time is reduced by 100 min on TRECVID and 200 min on CC_Web_Video datasets. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-14784-5 |