Loading…
Unsupervised hierarchical text summarization
Text Summarization is a process of shortening the longer text document(s) into a summary of informative sentences in an orderly way. Extractive summarization retrieves the important sentences without making any change to actual sentences. Each sentence in the document is represented as a vector of r...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Text Summarization is a process of shortening the longer text document(s) into a summary of informative sentences in an orderly way. Extractive summarization retrieves the important sentences without making any change to actual sentences. Each sentence in the document is represented as a vector of real numbers, which is termed as embedding. While embedding a sentence, values of specific features are analyzed and are plotted in n-dimensional space. This results in effective prediction of following and preceding sentences of the current sentence. Thus, the sentences with similar semantics lie closer to each other. Unsupervised Summarization groups the similar sentences by estimating the distance between the vectors and decides if sentence must be included in summary or not. Hierarchical Summarization constructs a tree-structure with the input text data, where the truncation of tree is done based on the number of clusters. Each cluster holds sentences that are semantically similar. By determining the nearest neighbor in each cluster, a specified number of sentences were retrieved from each cluster to be included in summary which holds at-least half the size of the input document(s). The performance of hierarchical summarization is measured on CNN/Daily Mail dataset and is determined using performance metrics. The evaluation score concludes that BIRCH algorithm performs better than the other Hierarchical Clustering techniques. |
---|---|
ISSN: | 0094-243X 1551-7616 |
DOI: | 10.1063/5.0116918 |