Loading…
Speaker interpolation for HMM-based speech synthesis system
This paper describes an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation.Although most text-to-speech synthesis systems which synthesize speech by concatenating speech units can synthesize speech with acceptable quality, they s...
Saved in:
Published in: | Acoustical Science and Technology 2000, Vol.21(4), pp.199-206 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c4859-c7203ce926c3144a6caaa1d678452ead15a328213274ba2dd583b4efd13ddb8f3 |
---|---|
cites | |
container_end_page | 206 |
container_issue | 4 |
container_start_page | 199 |
container_title | Acoustical Science and Technology |
container_volume | 21 |
creator | Yoshimura, Takayoshi Tokuda, Keiichi Masuko, Takashi Kobayashi, Takao Kitamura, Tadashi |
description | This paper describes an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation.Although most text-to-speech synthesis systems which synthesize speech by concatenating speech units can synthesize speech with acceptable quality, they still cannot synthesize speech with various voice quality such as speaker individualities and emotions;In order to control speaker individualities and emotions, therefore, they need a large database, which records speech units with various voice characteristics in sythesis phase.On the other hand, our system synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.Accordingly, our system can synthesize speech with various voice quality without large database in synthesis phase.An HMM interpolation technique is derived from a probabilistic similarity measure for HMMs, and used to synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.The results of subjective experiments show that we can gradually change the voice quality of synthesized speech from one’s to the other’s by changing the interpolation ratio. |
doi_str_mv | 10.1250/ast.21.199 |
format | article |
fullrecord | <record><control><sourceid>proquest_jstag</sourceid><recordid>TN_cdi_proquest_journals_1448255356</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3116612911</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4859-c7203ce926c3144a6caaa1d678452ead15a328213274ba2dd583b4efd13ddb8f3</originalsourceid><addsrcrecordid>eNo9UM1Kw0AYXETBWr34BAHPqdnfZPEgUtQKLR7U8_Jl94tJTJO4uz307Q1GvMwMzDADQ8g1zVaUyewWQlwxuqJan5AF5SJPJc3z01-tUq6VPicXIbRZxoSWakHu3kaEL_RJ00f049BBbIY-qQafbHa7tISALgkjoq2TcOxjjaEJkwoR95fkrIIu4NUfL8nH0-P7epNuX59f1g_b1IpC6tTmLOMWNVOWUyFAWQCgTuWFkAzBUQmcFYxylosSmHOy4KXAylHuXFlUfElu5t7RD98HDNG0w8H306SZ-gomJZdqSt3PqTZE-EQz-mYP_mjAx8Z2aKZrDKNGzDA99O_YGrzBnv8ATslf7A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1448255356</pqid></control><display><type>article</type><title>Speaker interpolation for HMM-based speech synthesis system</title><source>J-STAGE Free</source><creator>Yoshimura, Takayoshi ; Tokuda, Keiichi ; Masuko, Takashi ; Kobayashi, Takao ; Kitamura, Tadashi</creator><creatorcontrib>Yoshimura, Takayoshi ; Tokuda, Keiichi ; Masuko, Takashi ; Kobayashi, Takao ; Kitamura, Tadashi</creatorcontrib><description>This paper describes an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation.Although most text-to-speech synthesis systems which synthesize speech by concatenating speech units can synthesize speech with acceptable quality, they still cannot synthesize speech with various voice quality such as speaker individualities and emotions;In order to control speaker individualities and emotions, therefore, they need a large database, which records speech units with various voice characteristics in sythesis phase.On the other hand, our system synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.Accordingly, our system can synthesize speech with various voice quality without large database in synthesis phase.An HMM interpolation technique is derived from a probabilistic similarity measure for HMMs, and used to synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.The results of subjective experiments show that we can gradually change the voice quality of synthesized speech from one’s to the other’s by changing the interpolation ratio.</description><identifier>ISSN: 1346-3969</identifier><identifier>EISSN: 1347-5177</identifier><identifier>DOI: 10.1250/ast.21.199</identifier><language>eng</language><publisher>Tokyo: ACOUSTICAL SOCIETY OF JAPAN</publisher><subject>Hidden Markov model(HMM) ; Kullback information measure ; Speaker interpolation ; Text-to-speech synthesis(TTS)</subject><ispartof>Acoustical Science and Technology, 2000, Vol.21(4), pp.199-206</ispartof><rights>2000 by The Acoustical Society of Japan</rights><rights>Copyright Japan Science and Technology Agency 2000</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4859-c7203ce926c3144a6caaa1d678452ead15a328213274ba2dd583b4efd13ddb8f3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1875,4009,27902,27903,27904</link.rule.ids></links><search><creatorcontrib>Yoshimura, Takayoshi</creatorcontrib><creatorcontrib>Tokuda, Keiichi</creatorcontrib><creatorcontrib>Masuko, Takashi</creatorcontrib><creatorcontrib>Kobayashi, Takao</creatorcontrib><creatorcontrib>Kitamura, Tadashi</creatorcontrib><title>Speaker interpolation for HMM-based speech synthesis system</title><title>Acoustical Science and Technology</title><addtitle>Acoustical Science and Technology</addtitle><description>This paper describes an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation.Although most text-to-speech synthesis systems which synthesize speech by concatenating speech units can synthesize speech with acceptable quality, they still cannot synthesize speech with various voice quality such as speaker individualities and emotions;In order to control speaker individualities and emotions, therefore, they need a large database, which records speech units with various voice characteristics in sythesis phase.On the other hand, our system synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.Accordingly, our system can synthesize speech with various voice quality without large database in synthesis phase.An HMM interpolation technique is derived from a probabilistic similarity measure for HMMs, and used to synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.The results of subjective experiments show that we can gradually change the voice quality of synthesized speech from one’s to the other’s by changing the interpolation ratio.</description><subject>Hidden Markov model(HMM)</subject><subject>Kullback information measure</subject><subject>Speaker interpolation</subject><subject>Text-to-speech synthesis(TTS)</subject><issn>1346-3969</issn><issn>1347-5177</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2000</creationdate><recordtype>article</recordtype><recordid>eNo9UM1Kw0AYXETBWr34BAHPqdnfZPEgUtQKLR7U8_Jl94tJTJO4uz307Q1GvMwMzDADQ8g1zVaUyewWQlwxuqJan5AF5SJPJc3z01-tUq6VPicXIbRZxoSWakHu3kaEL_RJ00f049BBbIY-qQafbHa7tISALgkjoq2TcOxjjaEJkwoR95fkrIIu4NUfL8nH0-P7epNuX59f1g_b1IpC6tTmLOMWNVOWUyFAWQCgTuWFkAzBUQmcFYxylosSmHOy4KXAylHuXFlUfElu5t7RD98HDNG0w8H306SZ-gomJZdqSt3PqTZE-EQz-mYP_mjAx8Z2aKZrDKNGzDA99O_YGrzBnv8ATslf7A</recordid><startdate>2000</startdate><enddate>2000</enddate><creator>Yoshimura, Takayoshi</creator><creator>Tokuda, Keiichi</creator><creator>Masuko, Takashi</creator><creator>Kobayashi, Takao</creator><creator>Kitamura, Tadashi</creator><general>ACOUSTICAL SOCIETY OF JAPAN</general><general>Japan Science and Technology Agency</general><scope>7SP</scope><scope>7U5</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>2000</creationdate><title>Speaker interpolation for HMM-based speech synthesis system</title><author>Yoshimura, Takayoshi ; Tokuda, Keiichi ; Masuko, Takashi ; Kobayashi, Takao ; Kitamura, Tadashi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4859-c7203ce926c3144a6caaa1d678452ead15a328213274ba2dd583b4efd13ddb8f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Hidden Markov model(HMM)</topic><topic>Kullback information measure</topic><topic>Speaker interpolation</topic><topic>Text-to-speech synthesis(TTS)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yoshimura, Takayoshi</creatorcontrib><creatorcontrib>Tokuda, Keiichi</creatorcontrib><creatorcontrib>Masuko, Takashi</creatorcontrib><creatorcontrib>Kobayashi, Takao</creatorcontrib><creatorcontrib>Kitamura, Tadashi</creatorcontrib><collection>Electronics & Communications Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Acoustical Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yoshimura, Takayoshi</au><au>Tokuda, Keiichi</au><au>Masuko, Takashi</au><au>Kobayashi, Takao</au><au>Kitamura, Tadashi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Speaker interpolation for HMM-based speech synthesis system</atitle><jtitle>Acoustical Science and Technology</jtitle><addtitle>Acoustical Science and Technology</addtitle><date>2000</date><risdate>2000</risdate><volume>21</volume><issue>4</issue><spage>199</spage><epage>206</epage><pages>199-206</pages><issn>1346-3969</issn><eissn>1347-5177</eissn><abstract>This paper describes an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation.Although most text-to-speech synthesis systems which synthesize speech by concatenating speech units can synthesize speech with acceptable quality, they still cannot synthesize speech with various voice quality such as speaker individualities and emotions;In order to control speaker individualities and emotions, therefore, they need a large database, which records speech units with various voice characteristics in sythesis phase.On the other hand, our system synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.Accordingly, our system can synthesize speech with various voice quality without large database in synthesis phase.An HMM interpolation technique is derived from a probabilistic similarity measure for HMMs, and used to synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.The results of subjective experiments show that we can gradually change the voice quality of synthesized speech from one’s to the other’s by changing the interpolation ratio.</abstract><cop>Tokyo</cop><pub>ACOUSTICAL SOCIETY OF JAPAN</pub><doi>10.1250/ast.21.199</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1346-3969 |
ispartof | Acoustical Science and Technology, 2000, Vol.21(4), pp.199-206 |
issn | 1346-3969 1347-5177 |
language | eng |
recordid | cdi_proquest_journals_1448255356 |
source | J-STAGE Free |
subjects | Hidden Markov model(HMM) Kullback information measure Speaker interpolation Text-to-speech synthesis(TTS) |
title | Speaker interpolation for HMM-based speech synthesis system |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T01%3A11%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_jstag&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Speaker%20interpolation%20for%20HMM-based%20speech%20synthesis%20system&rft.jtitle=Acoustical%20Science%20and%20Technology&rft.au=Yoshimura,%20Takayoshi&rft.date=2000&rft.volume=21&rft.issue=4&rft.spage=199&rft.epage=206&rft.pages=199-206&rft.issn=1346-3969&rft.eissn=1347-5177&rft_id=info:doi/10.1250/ast.21.199&rft_dat=%3Cproquest_jstag%3E3116612911%3C/proquest_jstag%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4859-c7203ce926c3144a6caaa1d678452ead15a328213274ba2dd583b4efd13ddb8f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1448255356&rft_id=info:pmid/&rfr_iscdi=true |