Loading…
Evaluating the performance of detection algorithms in digital mammography
The initial and relative evaluation of computer methodologies developed for assisting diagnosis in mammography is usually done by comparing the computer output to ground truth data provided by experts and/or biopsy. Reported studies, however, give little information on how the performance indices of...
Saved in:
Published in: | Medical physics (Lancaster) 1999-02, Vol.26 (2), p.267-275 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The initial and relative evaluation of computer methodologies developed for assisting diagnosis in mammography is usually done by comparing the computer output to ground truth data provided by experts and/or biopsy. Reported studies, however, give little information on how the performance indices of computer assisted diagnosis (CAD) algorithms are determined in this initial stage of evaluation. Several strategies exist in the estimation of the true positive (TP) and false positive (FP) rates with respect to ground truth. Adopting one strategy over another yields different performance rates that can be over- or underestimates of the true performance. Furthermore, the estimation of pairs of TP and FP rates gives a partial picture of the performance of an algorithm. It is shown in this work that new performance indices are needed to fully describe the degree of detection (part or whole) and the type of detection (single calcification, cluster of calcifications, mass, or artifact). Several evaluation strategies were tested. The one that yielded the most realistic performances included the following criteria: The detected area should be at least 50% of the true area and no more than four times the true area in order to be considered TP. At least three true calcifications should be detected to within
1
cm
2
with nearest neighbor distances of less than √2 cm for a cluster to be considered TP. Separate detection measures should be established and used for artifacts and naturally occurring structures to maximize the benefits of the evaluation. Finally, it is critical that CAD investigators provide information on the tested image set as well as the criteria used for the evaluation of the algorithms to allow comparisons and better understanding of their methodologies. |
---|---|
ISSN: | 0094-2405 2473-4209 |
DOI: | 10.1118/1.598514 |