Loading…

Feature-based hybrid strategies for gradient descent optimization in end-to-end speech recognition

With the increasing popularity of deep learning, deep learning architectures are being utilized in speech recognition. Deep learning based speech recognition became the state-of-the-art method for speech recognition tasks due to their outstanding performance over other methods. Generally, deep learn...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2022-03, Vol.81 (7), p.9969-9988
Main Authors: Dokuz, Yesim, Tüfekci, Zekeriya
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the increasing popularity of deep learning, deep learning architectures are being utilized in speech recognition. Deep learning based speech recognition became the state-of-the-art method for speech recognition tasks due to their outstanding performance over other methods. Generally, deep learning architectures are trained with a variant of gradient descent optimization. Mini-batch gradient descent is a variant of gradient descent optimization which updates network parameters after traversing a number of training instances. One limitation of mini-batch gradient descent is the random selection of mini-batch samples from training set. This situation is not preferred in speech recognition which requires training features to collapse all possible variations in speech databases. In this study, to overcome this limitation, hybrid mini-batch sample selection strategies are proposed. The proposed hybrid strategies use gender and accent features of speech databases in a hybrid way to select mini-batch samples when training deep learning architectures. Experimental results justify that using hybrid of gender and accent features is more successful in terms of speech recognition performance than using only one feature. The proposed hybrid mini-batch sample selection strategies would benefit other application areas that have metadata information, including image recognition and machine vision.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-022-12304-5