Loading…
Proximity Estimation and Hardness of Short-Text Corpora
In this work, we investigate the relative hardness of short-text corpora in clustering problems and how this hardness relates to traditional similarity measures. Our approach basically attempts to establish a connection between the hardness of a corpus and the precision level exhibited by similarity...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this work, we investigate the relative hardness of short-text corpora in clustering problems and how this hardness relates to traditional similarity measures. Our approach basically attempts to establish a connection between the hardness of a corpus and the precision level exhibited by similarity measures, according to the results obtained with different cluster validity measures on the "ideal" clustering of each corpus. Moreover, we also propose a new validity measure, named contiguity error that allowed us to observe this connection in a consistent way in all the collections considered. |
---|---|
ISSN: | 1529-4188 2378-3915 |
DOI: | 10.1109/DEXA.2008.87 |