Loading…

On the Importance of Attention in Meta-Learning for Few-Shot Text Classification

Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards be...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2018-06
Main Authors:	Jiang, Xiang, Havaei, Mohammad, Chartrand, Gabriel, Chouaib, Hassan, Thomas, Vincent, Jesson, Andrew, Chapados, Nicolas, Matwin, Stan
Format:	Article
Language:	English
Subjects:	Adaptation Algorithms Classification Machine learning Parameterization Representations Synergistic effect Text editing
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Based on the Model-Agnostic Meta-Learning framework (MAML), we introduce the Attentive Task-Agnostic Meta-Learning (ATAML) algorithm for text classification. The essential difference between MAML and ATAML is in the separation of task-agnostic representation learning and task-specific attentive adaptation. The proposed ATAML is designed to encourage task-agnostic representation learning by way of task-agnostic parameterization and facilitate task-specific adaptation via attention mechanisms. We provide evidence to show that the attention mechanism in ATAML has a synergistic effect on learning performance. In comparisons with models trained from random initialization, pretrained models and meta trained MAML, our proposed ATAML method generalizes better on single-label and multi-label classification tasks in miniRCV1 and miniReuters-21578 datasets.
ISSN:	2331-8422