Loading…
Optimal estimator of hypothesis probability for data mining problems with small samples
The paper presents a new (to the best of the authors’ knowledge) estimator of probability called the “Eph √ 2 completeness estimator” along with a theoretical derivation of its optimality. The estimator is especially suitable for a small number of sample items, which is the feature of many real prob...
Saved in:
Published in: | International journal of applied mathematics and computer science 2012-09, Vol.22 (3), p.629-645 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The paper presents a new (to the best of the authors’ knowledge) estimator of probability called the “Eph √ 2 completeness estimator” along with a theoretical derivation of its optimality. The estimator is especially suitable for a small number of sample items, which is the feature of many real problems characterized by data insufficiency. The control parameter of the estimator is not assumed in an a priori, subjective way, but was determined on the basis of an optimization criterion (the least absolute errors).The estimator was compared with the universally used frequency estimator of probability and with Cestnik’s m-estimator with respect to accuracy. The comparison was realized both theoretically and experimentally. The results show the superiority of the Eph √ 2 completeness estimator over the frequency estimator for the probability interval p
∈ (0.1, 0.9). The frequency estimator is better for p
∈ [0, 0.1] and p
∈ [0.9, 1]. |
---|---|
ISSN: | 2083-8492 1641-876X 2083-8492 |
DOI: | 10.2478/v10006-012-0048-z |