Loading…
Long Term Memory-Enhanced Via Causal Reasoning for Text-To-Video Retrieval
The T2VR task aims to retrieve videos that are semantically relevant to the given query text in a large number of unlabeled videos. Most of the existing methods adopt a representation encoding strategy that can only focus on limited contextual information, and lack the ability to focus on the long m...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The T2VR task aims to retrieve videos that are semantically relevant to the given query text in a large number of unlabeled videos. Most of the existing methods adopt a representation encoding strategy that can only focus on limited contextual information, and lack the ability to focus on the long memory of representation sequences. Besides, they also ignore the semantic causal impact of the predecessor on the successor in the sequence. To tackle this issue, we propose a new long term memory-enhanced via causal reasoning to better learn the feature sequence of video and text. Firstly, we design semantic causal reasoning to allow video and text to adaptively capture their respective feature sequence's full-memory contextual causal relations and enhance the consistency of the semantic relations. Secondly, we perform key feature reweighting in the memory space of video and text respectively to make the key information focused. Finally, extensive experiments on three public datasets, i.e., MSR-VTT, VATEX, and TGIF, demonstrate the effectiveness of our proposed method. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP48485.2024.10448201 |