Loading…

Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning

•Word2vec representation improves the summarization task compared to bag of words.•Feature learning using unsupervised neural networks improves the summarization task.•Unsupervised neural networks trained on word2vec vectors gives promising results.•Ensemble learning with word2vec representation obt...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2019-06, Vol.123, p.195-211
Main Authors: Alami, Nabil, Meknassi, Mohammed, En-nahnahi, Noureddine
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Word2vec representation improves the summarization task compared to bag of words.•Feature learning using unsupervised neural networks improves the summarization task.•Unsupervised neural networks trained on word2vec vectors gives promising results.•Ensemble learning with word2vec representation obtains the best results. The vast amounts of data being collected and analyzed have led to invaluable source of information, which needs to be easily handled by humans. Automatic Text Summarization (ATS) systems enable users to get the gist of information and knowledge in a short time in order to make critical decisions quickly. Deep neural networks have proven their ability to achieve excellent performance in many real-world Natural Language Processing and computer vision applications. However, it still lacks attention in ATS. The key problem of traditional applications is that they involve high dimensional and sparse data, which makes it difficult to capture relevant information. One technique for overcoming these problems is learning features via dimensionality reduction. On the other hand, word embedding is another neural network technique that generates a much more compact word representation than a traditional Bag-of-Words (BOW) approach. In this paper, we are seeking to enhance the quality of ATS by integrating unsupervised deep neural network techniques with word embedding approach. First, we develop a word embedding based text summarization, and we show that Word2Vec representation gives better results than traditional BOW representation. Second, we propose other models by combining word2vec and unsupervised feature learning methods in order to merge information from different sources. We show that unsupervised neural networks models trained on Word2Vec representation give better results than those trained on BOW representation. Third, we also propose three ensemble techniques. The first ensemble combines BOW and word2vec using a majority voting technique. The second ensemble aggregates the information provided by the BOW approach and unsupervised neural networks. The third ensemble aggregates the information provided by Word2Vec and unsupervised neural networks. We show that the ensemble methods improve the quality of ATS, in particular the ensemble based on word2vec approach gives better results. Finally, we perform different experiments to evaluate the performance of the investigated models. We use two kind of datasets that are publically available for
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.01.037