Loading…
Blind consumer video quality assessment with spatial-temporal perception and fusion
Blind quality assessment for user-generated content (UGC) or consumer videos is challenging in computer vision. Two open issues are yet to be addressed: how to effectively extract high-dimensional spatial-temporal features of consumer videos and how to appropriately model the relationship between th...
Saved in:
Published in: | Multimedia tools and applications 2024-02, Vol.83 (7), p.18969-18986 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Blind quality assessment for user-generated content (UGC) or consumer videos is challenging in computer vision. Two open issues are yet to be addressed: how to effectively extract high-dimensional spatial-temporal features of consumer videos and how to appropriately model the relationship between these features and user perceptions within a unified blind video quality assessment (BVQA). To tackle these issues, we propose a novel BVQA model with spatial-temporal perception and fusion. Firstly, we develop two perception modules to extract the perceptual-distortion-related features separately from the spatial and temporal domains. In particular, the temporal-domain features are obtained with a combination of 3D ConvNet and residual frames for their high efficiencies in capturing the motion-specific temporal features. Secondly, we propose a feature fusion module that adaptively combines spatial-temporal features. Finally, we map the fused features onto perceptual quality. Experimental results demonstrate that our model outperforms other advanced methods in conducting subjective video quality prediction. |
---|---|
ISSN: | 1573-7721 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-16242-8 |