Loading…

HMM-based speech synthesis adaptation using noisy data: Analysis and evaluation methods

This paper investigates the role of noise in speaker-adaptation of HMM-based text-to-speech (TTS) synthesis and presents a new evaluation procedure. Both a new listening test based on ITU-T recommendation 835 and a perceptually motivated objective measure, frequency-weighted segmental SNR, improve t...

Full description

Saved in:
Bibliographic Details
Main Authors: Karhila, Reima, Remes, Ulpu, Kurimo, Mikko
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper investigates the role of noise in speaker-adaptation of HMM-based text-to-speech (TTS) synthesis and presents a new evaluation procedure. Both a new listening test based on ITU-T recommendation 835 and a perceptually motivated objective measure, frequency-weighted segmental SNR, improve the evaluation of synthetic speech when noise is present. The evaluation of voices adapted with noisy data show that the noise plays a relatively small but noticeable role in the quality of synthetic speech: Naturalness and speaker similarity are not affected in a significant way by the noise, but listeners prefer the voices trained from cleaner data. Noise removal, even when it degrades natural speech quality, improves the synthetic voice.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2013.6639005