Loading…

Language Model Using Neural Turing Machine Based on Localized Content-Based Addressing

The performance of a long short-term memory (LSTM) recurrent neural network (RNN)-based language model has been improved on language model benchmarks. Although a recurrent layer has been widely used, previous studies showed that an LSTM RNN-based language model (LM) cannot overcome the limitation of...

Full description

Saved in:

Bibliographic Details
Published in:	Applied sciences 2020-10, Vol.10 (20), p.7181
Main Authors:	Lee, Donghyun, Park, Jeong-Sik, Koo, Myoung-Wan, Kim, Ji-Hwan
Format:	Article
Language:	English
Subjects:	Analysis Benchmarks content-based addressing Datasets Deep learning Language language model Long short-term memory Long-term memory memory de-allocation mechanism-based neural Turing machine Neural networks neural Turing machine Recurrent neural networks Semantics Similarity
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The performance of a long short-term memory (LSTM) recurrent neural network (RNN)-based language model has been improved on language model benchmarks. Although a recurrent layer has been widely used, previous studies showed that an LSTM RNN-based language model (LM) cannot overcome the limitation of the context length. To train LMs on longer sequences, attention mechanism-based models have recently been used. In this paper, we propose a LM using a neural Turing machine (NTM) architecture based on localized content-based addressing (LCA). The NTM architecture is one of the attention-based model. However, the NTM encounters a problem with content-based addressing because all memory addresses need to be accessed for calculating cosine similarities. To address this problem, we propose an LCA method. The LCA method searches for the maximum of all cosine similarities generated from all memory addresses. Next, a specific memory area including the selected memory address is normalized with the softmax function. The LCA method is applied to pre-trained NTM-based LM during the test stage. The proposed architecture is evaluated on Penn Treebank and enwik8 LM tasks. The experimental results indicate that the proposed approach outperforms the previous NTM architecture.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app10207181