Loading…

Beyond rankings: Learning (more) from algorithm validation

Challenges have become the state-of-the-art approach to benchmark image analysis algorithms in a comparative manner. While the validation on identical data sets was a great step forward, results analysis is often restricted to pure ranking tables, leaving relevant questions unanswered. Specifically,...

Full description

Saved in:
Bibliographic Details
Published in:Medical image analysis 2023-05, Vol.86, p.102765-102765, Article 102765
Main Authors: Roß, Tobias, Bruno, Pierangela, Reinke, Annika, Wiesenfarth, Manuel, Koeppel, Lisa, Full, Peter M., Pekdemir, Bünyamin, Godau, Patrick, Trofimova, Darya, Isensee, Fabian, Adler, Tim J., Tran, Thuy N., Moccia, Sara, Calimeri, Francesco, Müller-Stich, Beat P., Kopp-Schneider, Annette, Maier-Hein, Lena
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Challenges have become the state-of-the-art approach to benchmark image analysis algorithms in a comparative manner. While the validation on identical data sets was a great step forward, results analysis is often restricted to pure ranking tables, leaving relevant questions unanswered. Specifically, little effort has been put into the systematic investigation on what characterizes images in which state-of-the-art algorithms fail. To address this gap in the literature, we (1) present a statistical framework for learning from challenges and (2) instantiate it for the specific task of instrument instance segmentation in laparoscopic videos. Our framework relies on the semantic meta data annotation of images, which serves as foundation for a General Linear Mixed Models (GLMM) analysis. Based on 51,542 meta data annotations performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Segmentation Challenge (ROBUST-MIS) challenge 2019 and revealed underexposure, motion and occlusion of instruments as well as the presence of smoke or other objects in the background as major sources of algorithm failure. Our subsequent method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail. Due to the objectivity and generic applicability of our approach, it could become a valuable tool for validation in the field of medical image analysis and beyond. [Display omitted] •Challenge analysis beyond rankings: Approach to systematically learning from challenges to tailor algorithm development.•Proof-of-concept: Concept instantiation to the ROBUST-MIS challenge proofs the value of the approach.•Impact: Method is broadly applicable to various tasks in the field of medical image analysis.
ISSN:1361-8415
1361-8423
DOI:10.1016/j.media.2023.102765