Loading…
A Multi-Task Hierarchical Approach for Intent Detection and Slot Filling
Spoken language understanding (SLU) plays an integral part in every dialogue system. To understand the intention of the user and extract the necessary information to help the user achieve desired goals is a challenging task. In this work, we propose an end-to-end hierarchical multi-task model that c...
Saved in:
Published in: | Knowledge-based systems 2019-11, Vol.183, p.104846, Article 104846 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Spoken language understanding (SLU) plays an integral part in every dialogue system. To understand the intention of the user and extract the necessary information to help the user achieve desired goals is a challenging task. In this work, we propose an end-to-end hierarchical multi-task model that can jointly perform both intent detection and slot filling tasks for the datasets of varying domains. The primary aim is to capture context information in a dialogue to help the SLU module in a dialogue system to correctly understand the user and assist the user in achieving the desired goals. It is vital for the SLU module to capture the past information along with the present utterance said by the user to retrieve correct information. The dependency and correlation between the two tasks, i.e. intent detection and slot filling makes the multi-task learning framework effective in capturing the desired information provided by the user. We use Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to capture contextual information for the utterances. We employ Conditional Random Field (CRF) to model label dependency. Both character and word level embeddings are provided as input to the models. We create a benchmark corpus for the SLU tasks, on TRAINS and FRAMES dataset for capturing more realistic and natural utterances spoken by the speakers in a human/machine dialogue system. Experimental results on multiple datasets of various domains (ATIS, SNIP, TRAINS and FRAMES) show that our proposed approach is effective compared to the individual models and the state-of-the-art methods. |
---|---|
ISSN: | 0950-7051 1872-7409 |
DOI: | 10.1016/j.knosys.2019.07.017 |