Loading…

Frame-Based SEMG-to-Speech Conversion

This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. A neural network is trained to map the SEMG features (short-time Fourier transform coefficients) into vector-quantized codebook indices of speech features (linear...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuet-Ming Lam, Leong, P.H.-W., Man-Wai Mak
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. A neural network is trained to map the SEMG features (short-time Fourier transform coefficients) into vector-quantized codebook indices of speech features (linear prediction coefficients, pitch, and energy). To synthesize a word, SEMG signals recorded during pronouncing a word are blocked into frames; SEMG features are then extracted from each SEMG frame and presented to the neural network to obtain a sequence of speech feature indices. The waveform of the word is then constructed by concatenating the pre-recorded speech segments corresponding to the feature indices. Experimental evaluations based on the synthesis of eight words show that on average over 70% of the words can be synthesized correctly and the neural network can classify SEMG frames into seven phonemes and silence at a rate of 77.8%. The rate can be further improved to 88.3% by assuming medium-time stationarity of the speech signals. The experimental results demonstrate the feasibility of synthesizing words based on SEMG signals only.
ISSN:1548-3746
1558-3899
DOI:10.1109/MWSCAS.2006.382042