Loading…
Textual keyword extraction and summarization: State-of-the-art
•Unsupervised learning approaches are widely employed for keyword extraction.•Recent use of deep neural networks has significantly improved abstractive summarization.•Deep learning frameworks are less applied for keyword extraction.•For reliability on deep learning models, their interpretability is...
Saved in:
Published in: | Information processing & management 2019-11, Vol.56 (6), p.102088, Article 102088 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Unsupervised learning approaches are widely employed for keyword extraction.•Recent use of deep neural networks has significantly improved abstractive summarization.•Deep learning frameworks are less applied for keyword extraction.•For reliability on deep learning models, their interpretability is of immense importance.•Scarcity of datasets for ill-structured and informal data has resulted into limited progress in relevant domains.•Existing evaluation metrics have limited applicability to determine semantic equivalence between machine and human generated summary
With the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results. |
---|---|
ISSN: | 0306-4573 1873-5371 |
DOI: | 10.1016/j.ipm.2019.102088 |