Loading…
Unsupervised clustering for nontextual web document classification
While the breath of vocabulary used in long documents may mislead the traditional keyword-based retrieval systems, the demands for techniques in nontextual Web classification and retrieval from a large document collection are mounting. Only a few prototype systems have attempted to classify hypertex...
Saved in:
Published in: | Decision Support Systems 2004-06, Vol.37 (3), p.377-396 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | While the breath of vocabulary used in long documents may mislead the traditional keyword-based retrieval systems, the demands for techniques in nontextual Web classification and retrieval from a large document collection are mounting. Only a few prototype systems have attempted to classify hypertext on the basis of nontextual elements in order to locate unfamiliar documents. As a result, a large portion of Web documents having pictorial information in nature is far beyond the reach of most current search engines. In this research, we devise a novel quantitative model of nontextual World Wide Web (WWW) classification based on image information. An intelligent content-sensitive, attribute-rich image classifier is presented. An image similarity measure is used to deduce the likelihood among images. Different image feature vectors have been constructed and evaluated. Evaluation shows images judged to be similar by human form interesting clusters in our unsupervised learning. Comparison with other clustering technique, such as Hierarchical Agglomerative Clustering (HAC), demonstrates that our approach is found useful in content-based image information retrieval. |
---|---|
ISSN: | 0167-9236 1873-5797 |
DOI: | 10.1016/S0167-9236(03)00035-6 |