Loading…
Multi-level feature disentanglement network for cross-dataset face forgery detection
Synthesizing videos with forged faces is a fundamental yet important safety-critical task that has caused severe security issues in recent years. Although many existing face forgery detection methods have achieved superior performance on such synthetic videos, they are severely limited by the domain...
Saved in:
Published in: | Image and vision computing 2023-07, Vol.135, p.104686, Article 104686 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Synthesizing videos with forged faces is a fundamental yet important safety-critical task that has caused severe security issues in recent years. Although many existing face forgery detection methods have achieved superior performance on such synthetic videos, they are severely limited by the domain-specific training data and generally perform unsatisfied when transferred to the cross-dataset scenario due to the domain gaps. Based on this observation, in this paper, we propose a multi-level feature disentanglement network to be robust to this domain bias induced by the different types of fake artifacts in different datasets. Specifically, we first detect the face image and transform it into both color-aware and frequency-aware inputs for multi-modal contextual representation learning. Then, we introduce a novel feature disentangling module that mainly utilizes a pair of complementary attention maps, to disentangle the synthetic features into separate realistic features and the features of fake artifacts. Since the features of fake artifacts are indirectly obtained from the latent features instead of the dataset-specific distribution, our forgery detection model is robust to the dataset-specific domain gaps. By applying the disentangling module to multi-levels of the feature extraction network with multi-modal inputs, we can obtain more robust feature representations. In addition, a realistic-aware adversary loss and a domain-aware adversary loss are adopted to facilitate the network for better feature disentanglement and extraction. Extensive experiments on four datasets verify the generalization of our method and present the state-of-the-art performance.
•Propose to disentangle synthetic face features into realistic and artifact features.•A novel multi-level feature disentanglement network used for disentanglement.•Realistic-aware and domain-aware discrimination losses strengthen disentanglement.•Achieve state-of-the-art performance on cross-dataset forgery detection. |
---|---|
ISSN: | 0262-8856 |
DOI: | 10.1016/j.imavis.2023.104686 |