Loading…

An information-theoretic measure of term specificity

The inverse document frequency (IDF) and signal‐noise ratio (S/N) approaches are two well known term weighting schemes based on term specificity. However, the existing justifications for these methods are still somewhat inconclusive and sometimes even based on incompatible assumptions. Although both...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Society for Information Science 1992-01, Vol.43 (1), p.54-61
Main Authors: Wong, S. K. M., Yao, Y. Y.
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The inverse document frequency (IDF) and signal‐noise ratio (S/N) approaches are two well known term weighting schemes based on term specificity. However, the existing justifications for these methods are still somewhat inconclusive and sometimes even based on incompatible assumptions. Although both methods are related to term specificity, their relationship has not been thoroughly investigated. An information‐theoretic measure for term specificity is introduced in this study. It is explicitly shown that the IDF weighting scheme can be derived from the proposed approach by assuming that the frequency of occurrence of each index term is uniform within the set of documents containing the term. The information‐theoretic interpretation of term specificity also establishes the relationship between the IDF and S/N methods. © 1992 John Wiley & Sons, Inc.
ISSN:0002-8231
1097-4571
DOI:10.1002/(SICI)1097-4571(199201)43:1<54::AID-ASI5>3.0.CO;2-A