Loading…

A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques

This paper presents a very low bit rate speech coder based on HMM (hidden Markov model). The encoder carries out phoneme recognition, and transmits phoneme indexes, state durations and pitch information to the decoder. In the decoder, phoneme HMMs are concatenated according to the phoneme indexes, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Tokuda, K., Masuko, T., Hiroi, J., Kobayashi, T., Kitamura, T.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a very low bit rate speech coder based on HMM (hidden Markov model). The encoder carries out phoneme recognition, and transmits phoneme indexes, state durations and pitch information to the decoder. In the decoder, phoneme HMMs are concatenated according to the phoneme indexes, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM by using an ML-based speech parameter generation technique. Finally we obtain synthetic speech by exciting the MLSA (mel log spectrum approximation) filter, whose coefficients are given by mel-cepstral coefficients, according to the pitch information. A subjective listening test shows that the performance of the proposed coder at about 150 bit/s (for the test data including 26% silence region) is comparable to a VQ-based vocoder at 400 bit/s (=8 bit/frame/spl times/50 frame/s) without pitch quantization for both coders.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.1998.675338