Loading…

A factorization network based method for multi-lingual domain classification

In many spoken language understanding systems (SLUS), domain classification is the most crucial component, as system responses based on wrong domains often yield very unpleasant user experiences. In multi-lingual domain classification, the training data for some poor-resource languages often comes f...

Full description

Saved in:
Bibliographic Details
Main Authors: Yangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang, Kaisheng Yao, Hu Chen, Yuanhang Zou, Baolin Peng
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In many spoken language understanding systems (SLUS), domain classification is the most crucial component, as system responses based on wrong domains often yield very unpleasant user experiences. In multi-lingual domain classification, the training data for some poor-resource languages often comes from machine translation. Some of the higher order n-gram features are distorted during machine translation. Feature co-occurrence becomes reliable feature in multi-lingual domain classification. In this paper, in order to effectively model feature co-occurrences, we propose Factorization Networks that are combinations of Factorization Machines (FMs) with Neural Networks (NNs). FNs extend the linear connections from the input feature layer to the hidden layer in NNs to factorization connections that represent the weights of feature co-occurrences using factorized method. In addition to FNs, we also propose a hybrid model that integrates FNs, NNs and Maximum Entropy (ME) models together. The component models in the hybrid model share the same input features. Based on two data sets (ATIS data set and Microsoft Cortana Chinese data ), the proposed models shows promising results. Especially for large Microsoft Cortana Chinese data which is translated from well annotated English data, FNs using unigram, class and query length features achieve more than 20% relative error reduction over linear (SVMs).
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2015.7178978