Loading…

Automatic phonetic segmentation

This paper presents the results and conclusions of a thorough study on automatic phonetic segmentation. It starts with a review of the state of the art in this field. Then, it analyzes the most frequently used approach-based on a modified Hidden Markov Model (HMM) phonetic recognizer. For this appro...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on speech and audio processing 2003-11, Vol.11 (6), p.617-625
Main Authors:	Toledano, D.T., Gomez, L.A.H., Grande, L.V.
Format:	Article
Language:	English
Subjects:	Applied sciences Boundaries Error correction Exact sciences and technology Fuzzy logic Hidden Markov models Information, signal and communications theory Labeling Mathematical models Neural networks Pattern classification Phonetics Recognition Research and development Segmentation Signal processing Speech analysis Speech processing Speech recognition Speech synthesis State of the art Studies Telecommunications and information theory
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper presents the results and conclusions of a thorough study on automatic phonetic segmentation. It starts with a review of the state of the art in this field. Then, it analyzes the most frequently used approach-based on a modified Hidden Markov Model (HMM) phonetic recognizer. For this approach, a statistical correction procedure is proposed to compensate for the systematic errors produced by context-dependent HMMs, and the use of speaker adaptation techniques is considered to increase segmentation precision. Finally, this paper explores the possibility of locally refining the boundaries obtained with the former techniques. A general framework is proposed for the local refinement of boundaries, and the performance of several pattern classification approaches (fuzzy logic, neural networks and Gaussian mixture models) is compared within this framework. The resulting phonetic segmentation scheme was able to increase the performance of a baseline HMM segmentation tool from 27.12%, 79.27%, and 97.75% of automatic boundary marks with errors smaller than 5, 20, and 50 ms, respectively, to 65.86%, 96.01%, and 99.31% in speaker-dependent mode, which is a reasonably good approximation to manual segmentation.
ISSN:	1063-6676 2329-9290 1558-2353 2329-9304
DOI:	10.1109/TSA.2003.813579