Loading…

Classification of RSS feed news items using ontology

Explosive growth of data on the web demand techniques, which would enable the user to access desired information. In Information retrieval Document Classification is prerequisite. In practice many classification techniques were and are in use. Term Frequency-Inverse Document Frequency (TF-IDF) is an...

Full description

Saved in:
Bibliographic Details
Main Authors: Agarwal, S., Singhal, A., Bedi, P.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Explosive growth of data on the web demand techniques, which would enable the user to access desired information. In Information retrieval Document Classification is prerequisite. In practice many classification techniques were and are in use. Term Frequency-Inverse Document Frequency (TF-IDF) is an approach which represents documents based on the frequency of terms in documents. Limitation of this approach is high dimensionality of data. Moreover it does not consider the relations among the terms, resulting in less precise and noisy end result. In our approach we are using weighted Concept Frequency-Inverse Document Frequency (CF-IDF) with background knowledge of domain Ontology, for classification of RSS feed News Items. Metadata information of news items has been used to assign weight to the identified concepts. No trained classifiers are required as Ontology itself acts as a classifier. We have designed ontology based on news industry standards. This classification approach considers relations among the concepts and properties. It results in reduction of noise in final output. It considers only the key concepts of a domain for classification instead of all the terms, which curbs the problem of dimensionality. Evaluation of experimental results reveals that proposed approach gives better classification results.
ISSN:2164-7143
2164-7151
DOI:10.1109/ISDA.2012.6416587