Loading…

Performance analysis of ASR system in hybrid DNN-HMM framework using a PWL euclidean activation function

Automatic Speech Recognition (ASR) is the process of mapping an acoustic speech signal into a human readable text format. Traditional systems exploit the Acoustic Component of ASR using the Gaussian Mixture Model- Hidden Markov Model (GMM-HMM) approach.Deep NeuralNetwork (DNN) opens up new possibili...

Full description

Saved in:
Bibliographic Details
Published in:Frontiers of Computer Science 2021-08, Vol.15 (4), p.154705, Article 154705
Main Authors: DUTTA, Anirban, ASHISHKUMAR, Gudmalwar, RAO, Ch V Rama
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic Speech Recognition (ASR) is the process of mapping an acoustic speech signal into a human readable text format. Traditional systems exploit the Acoustic Component of ASR using the Gaussian Mixture Model- Hidden Markov Model (GMM-HMM) approach.Deep NeuralNetwork (DNN) opens up new possibilities to overcome the shortcomings of conventional statistical algorithms. Recent studies modeled the acoustic component of ASR system using DNN in the so called hybrid DNN-HMM approach. In the context of activation functions used to model the non-linearity in DNN, Rectified Linear Units (ReLU) and maxout units are mostly used in ASR systems. This paper concentrates on the acoustic component of a hybrid DNN-HMM system by proposing an efficient activation function for the DNN network. Inspired by previous works, euclidean norm activation function is proposed to model the non-linearity of the DNN network. Such non-linearity is shown to belong to the family of Piecewise Linear (PWL) functions having distinct features. These functions can capture deep hierarchical features of the pattern. The relevance of the proposal is examined in depth both theoretically and experimentally. The performance of the developed ASR system is evaluated in terms of Phone Error Rate (PER) using TIMIT database. Experimental results achieve a relative increase in performance by using the proposed function over conventional activation functions.
ISSN:2095-2228
2095-2236
DOI:10.1007/s11704-020-9419-z