Loading…
A novel approach to capture the similarity in summarized text using embedded model
The presence of near duplicate textual content imposes great challenges while extracting information from it. To handle these challenges, detection of near duplicates is a prime research concern. Existing research mostly uses text clustering, classification and retrieval algorithms for detection of...
Saved in:
Published in: | International journal on smart sensing and intelligent systems 2022-01, Vol.15 (1), p.1-20 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The presence of near duplicate textual content imposes great challenges while extracting information from it. To handle these challenges, detection of near duplicates is a prime research concern. Existing research mostly uses text clustering, classification and retrieval algorithms for detection of near duplicates. Text summarization, an important tool of text mining, is not explored yet for the detection of near duplicates. Instead of using the whole document, the proposed method uses its summary as it saves both time and storage. Experimental results show that traditional similarity algorithms were able to capture similarity relatedness to a great extent even on the summarized text with a similarity score of 44.685%. Moreover, degree of similarity capture was greater (0.52%) in case of use of embedding models with better text representation as compared to traditional methods. Also, this paper highlights the research status of various similarity measures in terms of concept involved, merits and demerits. |
---|---|
ISSN: | 1178-5608 |
DOI: | 10.21307/ijssis-2022-0002 |