Loading…

Text documents streams with improved incremental similarity

There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This p...

Full description

Saved in:
Bibliographic Details
Published in:Social network analysis and mining 2021-12, Vol.11 (1), p.113, Article 113
Main Authors: Sarmento, Rui Portocarrero, O. Cardoso, Douglas, Dearo, Kemmily, Brazdil, Pavel, Gama, João
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.
ISSN:1869-5450
1869-5469
DOI:10.1007/s13278-021-00826-z