Loading…
INCLUSION OF RELEVANCE INFORMATION IN THE TERM DISCRIMINATION MODEL
The term discrimination value of an index term has been proposed as a quantitative measure of the extent to which that term can discriminate between documents in bibliographic databases. Previous work has suggested that the most discriminating terms are those with medium frequencies of occurrence. T...
Saved in:
Published in: | Journal of documentation 1989-06, Vol.45 (2), p.85-109 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The term discrimination value of an index term has been proposed as a quantitative measure of the extent to which that term can discriminate between documents in bibliographic databases. Previous work has suggested that the most discriminating terms are those with medium frequencies of occurrence. This paper discusses the effect of including relevance data on the calculation of term discrimination values. Two algorithms are described that calculate the ability of index terms to discriminate between relevant documents, between non-relevant documents or between relevant and non-relevant documents. The application of these algorithms to several standard document test collections demonstrates that the exact form of the relationship between term frequency and term discrimination depends upon the particular type of discrimination which is being measured; in particular, medium frequency terms are not necessarily the best discriminators when relevance data is available. These results are compared with the discriminatory ability of terms as measured by their relevance weights, where the most discriminating terms are those with low frequencies of occurrence. |
---|---|
ISSN: | 0022-0418 1758-7379 |
DOI: | 10.1108/eb026840 |