Loading…
Targeted Speech Adversarial Example Generation With Generative Adversarial Network
Although neural network-based speech recognition models have enjoyed significant success in many acoustic systems, they are susceptible to be attacked by the adversarial examples. In this work, we make first step towards using generative adversarial network (GAN) for constructing the targeted speech...
Saved in:
Published in: | IEEE access 2020, Vol.8, p.124503-124513 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Although neural network-based speech recognition models have enjoyed significant success in many acoustic systems, they are susceptible to be attacked by the adversarial examples. In this work, we make first step towards using generative adversarial network (GAN) for constructing the targeted speech adversarial examples. Specifically, we integrate the target speech recognition network with GAN framework, which can then be formulated as a three-party game. The generator in GAN aims at generating perturbation that could make the target network misclassified to a specific target, while simultaneously fooling the discriminator treating the adversarial example as a beguine one. The discriminator is to distinguish the crafted examples from the geniue samples. The classification error of the target network is back-propagated via gradient flow to the generator for updating. The target network is responsible for back-propagating the classification error via gradients to the generator for updating, but the target network itself is freezed. With the carefully designed network architecture, loss function and training strategy, we successfully train a generator that could generate the adversarial perturbation for a given speech clip and a target label. Experiential results show that the generated adversarial examples could effectively fool the state-of-the-art speech classification networks, while attaining an acceptable auditory perception quality. In addition, our proposed method runs much faster than the prevalent optimization-based schemes. To facilitate reproducible research, codes, models and data are publicly available at https://github.com/winterwindwang/SpeechAdvGan. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2020.3006130 |