Loading…

Phone Segmentation for Japanese Triphthong Using Neural Networks

Context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive tri-phone models. However, these models need a large number of speech parameters and a large volume of speech...

Full description

Saved in:

Bibliographic Details
Main Authors:	Banik, M., Hossain, Md Modasser, Saha, Aloke Kumar, Hassan, Foyzul, Kotwal, Mohammed Rokibul Alam, Huda, Mohammad Nurul
Format:	Conference Proceeding
Language:	English
Subjects:	Context Distinctive Phonetic Feature Feature extraction Hidden Markov Model Hidden Markov models Local Features Mel frequency cepstral coefficient Multi-Layer Neural Network Recurrent Neural Network Recurrent neural networks Speech recognition
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive tri-phone models. However, these models need a large number of speech parameters and a large volume of speech corpus. In this paper, we propose a technique to model a dynamic process of co-articulation and embed it to ASR systems. Recurrent Neural Network (RNN) is expected to realize this dynamic process. But main problem is the slowness of RNN for training the network of large size. We introduce Distinctive Phonetic Feature (DPF) based feature extraction using a two-stage system consists of a Multi-Layer Neural Network (MLN) in the first stage and another MLN in the second stage where the first MLN is expected to reduce the dynamics of acoustic feature pattern and the second MLN to suppress the fluctuation caused by DPF context. The experiments are carried out using Japanese triphthong data. The proposed DPF based feature extractor provides better segmentation performance with a reduced mixture-set of HMMs. Better context effect is achieved with less computation using MLN instead of RNN.
DOI:	10.1109/ITNG.2011.88