Loading…

Lower resources of spoken language understanding from voice to semantics

Spoken language understanding is traditionally designed as a pipeline consisting of multiple components. First, the speech signal is mapped into text through the automatic speech recognition module, and then the natural language understanding module converts the recognized text into structured data,...

Full description

Saved in:
Bibliographic Details
Published in:Journal of physics. Conference series 2020-04, Vol.1486 (5), p.52033
Main Authors: Hao, Zhang, Cheng Guo, LV
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spoken language understanding is traditionally designed as a pipeline consisting of multiple components. First, the speech signal is mapped into text through the automatic speech recognition module, and then the natural language understanding module converts the recognized text into structured data, such as domain, intention and slot value. Usually these modules are trained separately. End-to-end speech comprehension, on the other hand, derives structured data directly from speech through a single model. However, end-to-end spoken language understanding based on a large amount of training data is difficult to achieve in different fields and different groups of people. For this reason, we introduced end-to-end oral comprehension based on pre-training with low resources and combined it with capsule vector. The experimental results show that the oral comprehension of this model with low resources is robust under different data sets.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1486/5/052033