Loading…

Besting the Black-Box: Barrier Zones for Adversarial Example Defense

Adversarial machine learning defenses have primarily been focused on mitigating static, white-box attacks. However, it remains an open question whether such defenses are robust under an adaptive black-box adversary. In this paper, we specifically focus on the black-box threat model and make the foll...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2022, Vol.10, p.1451-1474
Main Authors:	Mahmood, Kaleel, Nguyen, Phuong Ha, Nguyen, Lam M., Nguyen, Thanh, Van Dijk, Marten
Format:	Article
Language:	English
Subjects:	adversarial defense adversarial examples Adversarial machine learning black-box attack Datasets Deep learning Experimentation Machine learning Measurement Robustness Security Training data Transforms
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Adversarial machine learning defenses have primarily been focused on mitigating static, white-box attacks. However, it remains an open question whether such defenses are robust under an adaptive black-box adversary. In this paper, we specifically focus on the black-box threat model and make the following contributions: First we develop an enhanced adaptive black-box attack which is experimentally shown to be \geq 30\% more effective than the original adaptive black-box attack proposed by Papernot et al. For our second contribution, we test 10 recent defenses using our new attack and propose our own black-box defense (barrier zones). We show that our defense based on barrier zones offers significant improvements in security over state-of-the-art defenses. This improvement includes greater than 85% robust accuracy against black-box boundary attacks, transfer attacks and our new adaptive black-box attack, for the datasets we study. For completeness, we verify our claims through extensive experimentation with 10 other defenses using three adversarial models (14 different black-box attacks) on two datasets (CIFAR-10 and Fashion-MNIST).
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3138966