Loading…

Feature-filter: Detecting adversarial examples by filtering out recessive features

Deep neural networks (DNNs) have achieved state-of-the-art performance in numerous tasks involving complex analysis of raw data, such as self-driving systems and biometric recognition systems. However, recent works have shown that DNNs are under threat from adversarial example attacks. The adversary...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing 2022-07, Vol.124, p.109027, Article 109027
Main Authors: Liu, Hui, Zhao, Bo, Ji, Minzhi, Peng, Yuefeng, Guo, Jiabao, Liu, Peng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep neural networks (DNNs) have achieved state-of-the-art performance in numerous tasks involving complex analysis of raw data, such as self-driving systems and biometric recognition systems. However, recent works have shown that DNNs are under threat from adversarial example attacks. The adversary can easily change the outputs of DNNs by adding small well-designed perturbations to inputs. Adversarial example detection is fundamental for robust DNN-based services. From a human-centric perspective, this paper divides image features into dominant features comprehensible to humans and recessive features incomprehensible to humans yet exploited by DNNs. Based on this perspective, the paper proposes a new viewpoint that imperceptible adversarial examples are the product of recessive features misleading neural networks, and that the adversarial attack enriches these recessive features. The imperceptibility of the adversarial examples indicates that the perturbations enrich recessive features but hardly affect dominant features. Therefore, adversarial examples are sensitive to filtering out recessive features, while benign examples are immune to such operations. Inspired by this idea, we propose a label-only adversarial detector that is referred to as a feature-filter. The feature-filter utilizes the discrete cosine transform (DCT) to approximately separate recessive features from dominant features and obtain a filtered image. A comprehensive user study demonstrates that the DCT-based filter can reliably filter out recessive features from the test image. By comparing only the DNN’s prediction labels on the input and its filtered version, the feature-filter can detect imperceptible adversarial examples in real time with high accuracy and few false-positives. •We reveal the reason for the existence of imperceptible adversarial examples.•We propose a label-only approach to detect imperceptible adversarial examples.•We design a DCT-based filter to reliably filter out recessive features.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2022.109027