Loading…

Preprocessing of Slovak Blog Articles for Clustering

Web content clustering is very important part of topic detection and tracking issue. In our paper we focus on pre-processing phase of web content clustering. We focus on blog articles published in Slovak language. We evaluate the impact of different data pre-processing methods on success of blog clu...

Full description

Saved in:
Bibliographic Details
Main Authors: Kuzar, Tomas, Navrat, Pavol
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Web content clustering is very important part of topic detection and tracking issue. In our paper we focus on pre-processing phase of web content clustering. We focus on blog articles published in Slovak language. We evaluate the impact of different data pre-processing methods on success of blog clustering. We found out that applying various text data manipulation techniques in preprocessing can improve the quality of clusters. The quality of clusters is measured by traditional clustering metrics like precision, recall and F-measure.
DOI:10.1109/WI-IAT.2010.273