Loading…

Evaluating the performance of detection algorithms in digital mammography

The initial and relative evaluation of computer methodologies developed for assisting diagnosis in mammography is usually done by comparing the computer output to ground truth data provided by experts and/or biopsy. Reported studies, however, give little information on how the performance indices of...

Full description

Saved in:
Bibliographic Details
Published in:Medical physics (Lancaster) 1999-02, Vol.26 (2), p.267-275
Main Authors: Kallergi, Maria, Carney, Gregory M., Gaviria, Jorge
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The initial and relative evaluation of computer methodologies developed for assisting diagnosis in mammography is usually done by comparing the computer output to ground truth data provided by experts and/or biopsy. Reported studies, however, give little information on how the performance indices of computer assisted diagnosis (CAD) algorithms are determined in this initial stage of evaluation. Several strategies exist in the estimation of the true positive (TP) and false positive (FP) rates with respect to ground truth. Adopting one strategy over another yields different performance rates that can be over- or underestimates of the true performance. Furthermore, the estimation of pairs of TP and FP rates gives a partial picture of the performance of an algorithm. It is shown in this work that new performance indices are needed to fully describe the degree of detection (part or whole) and the type of detection (single calcification, cluster of calcifications, mass, or artifact). Several evaluation strategies were tested. The one that yielded the most realistic performances included the following criteria: The detected area should be at least 50% of the true area and no more than four times the true area in order to be considered TP. At least three true calcifications should be detected to within 1  cm 2 with nearest neighbor distances of less than √2 cm for a cluster to be considered TP. Separate detection measures should be established and used for artifacts and naturally occurring structures to maximize the benefits of the evaluation. Finally, it is critical that CAD investigators provide information on the tested image set as well as the criteria used for the evaluation of the algorithms to allow comparisons and better understanding of their methodologies.
ISSN:0094-2405
2473-4209
DOI:10.1118/1.598514