Loading…
An information-theoretic measure of term specificity
The inverse document frequency (IDF) and signal‐noise ratio (S/N) approaches are two well known term weighting schemes based on term specificity. However, the existing justifications for these methods are still somewhat inconclusive and sometimes even based on incompatible assumptions. Although both...
Saved in:
Published in: | Journal of the American Society for Information Science 1992-01, Vol.43 (1), p.54-61 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The inverse document frequency (IDF) and signal‐noise ratio (S/N) approaches are two well known term weighting schemes based on term specificity. However, the existing justifications for these methods are still somewhat inconclusive and sometimes even based on incompatible assumptions. Although both methods are related to term specificity, their relationship has not been thoroughly investigated. An information‐theoretic measure for term specificity is introduced in this study. It is explicitly shown that the IDF weighting scheme can be derived from the proposed approach by assuming that the frequency of occurrence of each index term is uniform within the set of documents containing the term. The information‐theoretic interpretation of term specificity also establishes the relationship between the IDF and S/N methods. © 1992 John Wiley & Sons, Inc. |
---|---|
ISSN: | 0002-8231 1097-4571 |
DOI: | 10.1002/(SICI)1097-4571(199201)43:1<54::AID-ASI5>3.0.CO;2-A |