Loading…

GAN acoustic model for Kazakh speech synthesis

Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesi...

Full description

Saved in:
Bibliographic Details
Published in:International journal of speech technology 2021-09, Vol.24 (3), p.729-735
Main Authors: Kaliyev, Arman, Zeno, Bassel, Rybin, Sergey V., Matveev, Yuri N., Lyakso, Elena E.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesis. The proposed GAN consists of a generator and a pair of agent discriminators, where the generator produces acoustic parameters taking into account linguistic parameters; and the pair of agent discriminators are introduced to improve the naturalness of the synthesized speech. We feed the agents with acoustic and linguistic parameters, thereby the agents do not only examine the acoustic distribution, but also the relationship between linguistic and acoustic parameters. Training and testing were conducted on the Kazakh speech corpus. According to the results of this research, the proposed framework of GAN improves the accuracy of the acoustic model for the Kazakh text-to-speech system.
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-021-09840-0