Loading…

Fuzzy named entity-based document clustering

Traditional keyword-based document clustering techniques have limitations due to simple treatment of words and hard separation of clusters. In this paper, we introduce named entities as objectives into fuzzy document clustering, which are the key elements defining document semantics and in many case...

Full description

Saved in:
Bibliographic Details
Main Authors: Cao, T.H., Do, H.T., Hong, D.T., Quan, T.T.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional keyword-based document clustering techniques have limitations due to simple treatment of words and hard separation of clusters. In this paper, we introduce named entities as objectives into fuzzy document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type pairs, and identifiers, instead of keywords. Then, hierarchical fuzzy document clustering can be performed using a similarity measure of the vectors representing documents. For evaluating fuzzy clustering quality, we propose a fuzzy information variation measure to compare two fuzzy partitions. Experimental results are presented and discussed.
ISSN:1098-7584
DOI:10.1109/FUZZY.2008.4630648