Loading…

Agglomerative vs. tree-based clustering for the definition of multilingual set of triphones

The paper addresses the problem of multilingual acoustic modelling for the design of multilingual speech recognisers. Two different approaches for the definition of multilingual set of triphones (bottom-up and a top-down) are investigated. A new clustering algorithm for the definition of multilingua...

Full description

Saved in:
Bibliographic Details
Main Authors: Imperl, B., Kacic, Z., Horvat, B., Zgank, A.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The paper addresses the problem of multilingual acoustic modelling for the design of multilingual speech recognisers. Two different approaches for the definition of multilingual set of triphones (bottom-up and a top-down) are investigated. A new clustering algorithm for the definition of multilingual set of triphones is proposed. The agglomerative clustering algorithm (bottom-up) is based on a definition of a distance measure for triphones defined as a weighted sum of explicit estimates of the context similarity on a monophone level. The monophone similarity estimation method is based on the algorithm of Houtgast. The second type of system uses tree-based clustering (top-down) with a common decision tree. The experiments were based on the SpeechDat II databases (Slovenian, Spanish and German 1000 FDB SpeechDat II). Experiments have shown that the use of the agglomerative clustering algorithm results in a significant reduction of the number of triphones with minor degradation of word accuracy.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2000.861809