Loading…

Unsupervised hierarchical text summarization

Text Summarization is a process of shortening the longer text document(s) into a summary of informative sentences in an orderly way. Extractive summarization retrieves the important sentences without making any change to actual sentences. Each sentence in the document is represented as a vector of r...

Full description

Saved in:
Bibliographic Details
Main Authors: Divya, S., Sripriya, N.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text Summarization is a process of shortening the longer text document(s) into a summary of informative sentences in an orderly way. Extractive summarization retrieves the important sentences without making any change to actual sentences. Each sentence in the document is represented as a vector of real numbers, which is termed as embedding. While embedding a sentence, values of specific features are analyzed and are plotted in n-dimensional space. This results in effective prediction of following and preceding sentences of the current sentence. Thus, the sentences with similar semantics lie closer to each other. Unsupervised Summarization groups the similar sentences by estimating the distance between the vectors and decides if sentence must be included in summary or not. Hierarchical Summarization constructs a tree-structure with the input text data, where the truncation of tree is done based on the number of clusters. Each cluster holds sentences that are semantically similar. By determining the nearest neighbor in each cluster, a specified number of sentences were retrieved from each cluster to be included in summary which holds at-least half the size of the input document(s). The performance of hierarchical summarization is measured on CNN/Daily Mail dataset and is determined using performance metrics. The evaluation score concludes that BIRCH algorithm performs better than the other Hierarchical Clustering techniques.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0116918