Loading…
Deep4SNet: deep learning for fake speech classification
•Deep4SNet is a text-independent classifier of original/fake speech recordings.•It is based on a customized deep learning architecture.•Speech recordings are transformed into histograms to feed the model.•Experimental results are performed on Deep Voice and Imitation datasets.•The accuracy of the cl...
Saved in:
Published in: | Expert systems with applications 2021-12, Vol.184, p.115465, Article 115465 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Deep4SNet is a text-independent classifier of original/fake speech recordings.•It is based on a customized deep learning architecture.•Speech recordings are transformed into histograms to feed the model.•Experimental results are performed on Deep Voice and Imitation datasets.•The accuracy of the classifier is over 98%.
Fake speech consists on voice recordings created even by artificial intelligence or signal processing techniques. Among the methods for generating false voice recordings are Deep Voice and Imitation. In Deep voice, the recordings sound slightly synthesized, whereas in Imitation, they sound natural. On the other hand, the task of detecting fake content is not trivial considering the large number of voice recordings that are transmitted over the Internet. In order to detect fake voice recordings obtained by Deep Voice and Imitation, we propose a solution based on a Convolutional Neural Network (CNN), using image augmentation and dropout. The proposed architecture was trained with 2092 histograms of both original and fake voice recordings and cross-validated with 864 histograms. 476 new histograms were used for external validation, and Precision (P) and Recall (R) were calculated. Detection of fake audios reached P=0.997,R=0.997 for Imitation-based recordings, and P=0.985,R=0.944 for Deep Voice-based recordings. The global accuracy was 0.985. According to the results, the proposed system is successful in detecting fake voice content. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2021.115465 |