Loading…

Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space

This paper proposes a rule-based voice conversion system for emotion which is capable of converting neutral speech to emotional speech using dimensional space (arousal and valence) to control the degree of emotion on a continuous scale. We propose an inverse three-layered model with acoustic feature...

Full description

Saved in:
Bibliographic Details
Published in:Speech communication 2018-09, Vol.102, p.54-67
Main Authors: Xue, Yawen, Hamada, Yasuhiro, Akagi, Masato
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper proposes a rule-based voice conversion system for emotion which is capable of converting neutral speech to emotional speech using dimensional space (arousal and valence) to control the degree of emotion on a continuous scale. We propose an inverse three-layered model with acoustic features as output at the top layer, semantic primitives at the middle layer and emotion dimension as input at the bottom layer; an adaptive-based fuzzy inference system acts as connectors to extract the non-linear rules among the three layers. The rules are applied by modifying the acoustic features of neutral speech to create the different types of emotional speech. The prosody-related acoustic features of F0 and power envelope are parameterized using the Fujisaki model and target prediction model separately. Perceptual evaluation results show that the degree of emotion can be perceived well in the dimensional space of valence and arousal.
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2018.06.006