Loading…
Frame-Based SEMG-to-Speech Conversion
This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. A neural network is trained to map the SEMG features (short-time Fourier transform coefficients) into vector-quantized codebook indices of speech features (linear...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. A neural network is trained to map the SEMG features (short-time Fourier transform coefficients) into vector-quantized codebook indices of speech features (linear prediction coefficients, pitch, and energy). To synthesize a word, SEMG signals recorded during pronouncing a word are blocked into frames; SEMG features are then extracted from each SEMG frame and presented to the neural network to obtain a sequence of speech feature indices. The waveform of the word is then constructed by concatenating the pre-recorded speech segments corresponding to the feature indices. Experimental evaluations based on the synthesis of eight words show that on average over 70% of the words can be synthesized correctly and the neural network can classify SEMG frames into seven phonemes and silence at a rate of 77.8%. The rate can be further improved to 88.3% by assuming medium-time stationarity of the speech signals. The experimental results demonstrate the feasibility of synthesizing words based on SEMG signals only. |
---|---|
ISSN: | 1548-3746 1558-3899 |
DOI: | 10.1109/MWSCAS.2006.382042 |