Loading…

Unsupervised phoneme and word acquisition from continuous speech based on a hierarchical probabilistic generative model

Humans can divide the perceived continuous speech signals, which exhibit double articulation structure, into phonemes and words without explicit boundary points or labels and thus learn a language. In constructive developmental studies, learning the double articulation structure of speech signals is...

Full description

Saved in:
Bibliographic Details
Published in:Advanced robotics 2023-10, Vol.37 (19), p.1253-1265
Main Authors: Nagano, Masatoshi, Nakamura, Tomoaki
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Humans can divide the perceived continuous speech signals, which exhibit double articulation structure, into phonemes and words without explicit boundary points or labels and thus learn a language. In constructive developmental studies, learning the double articulation structure of speech signals is important for realizing robots with human-like language learning abilities. In this study, we propose a novel probabilistic generative model called the Gaussian process-hidden semi-Markov model-based double articulation analyzer (GP-HSMM-DAA), which can learn phonemes and words from continuous speech signals by hierarchically connecting two probabilistic generative models (PGMs), namely, the Gaussian process-hidden semi-Markov model and hidden semi-Markov model. In the proposed model, the parameters of each PGM are mutually and complementarily updated and learned, enabling accurate learning of the phonemes and words. The experimental results reveal that GP-HSMM-DAA can segment continuous speech into phonemes and words with higher accuracy than the conventional method.
ISSN:0169-1864
1568-5535
DOI:10.1080/01691864.2023.2252048