Loading…

West: Word Encoded Sequence Transducers

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottleneck in memory constraint on-device training applications like federated learning and on-device inference applications...

Full description

Saved in:

Bibliographic Details
Main Authors:	Variani, Ehsan, Suresh, Ananda Theertha, Weintraub, Mitchel
Format:	Conference Proceeding
Language:	English
Subjects:	Mathematical model Predictive models Sparse matrices Task analysis Training Transducers Vocabulary
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottleneck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller sub-units such as characters. However, often the sub-unit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units and demonstrate that this transduction can lead to significant compression without compromising performance.
ISSN:	2379-190X
DOI:	10.1109/ICASSP.2019.8683694