Loading…

Robust machine learning challenge: An AIFM multicentric competition to spread knowledge, identify common pitfalls and recommend best practice

[Display omitted] •AI4MP-Challenge is the first AIFM multicentric experience on machine learning.•The main objective is to improve knowledge and skills of medical physicists on machine learning.•Encountered pitfalls: violation of independence assumption, computation errors, data imbalance.•Providing...

Full description

Saved in:
Bibliographic Details
Published in:Physica medica 2024-11, Vol.127, p.104834, Article 104834
Main Authors: Maddalo, Michele, Fanizzi, Annarita, Lambri, Nicola, Loi, Emiliano, Branchini, Marco, Lorenzon, Leda, Giuliano, Alessia, Ubaldi, Leonardo, Saponaro, Sara, Signoriello, Michele, Fadda, Federico, Belmonte, Gina, Giannelli, Marco, Talamonti, Cinzia, Iori, Mauro, Tangaro, Sabina, Massafra, Raffaella, Mancosu, Pietro, Avanzo, Michele
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •AI4MP-Challenge is the first AIFM multicentric experience on machine learning.•The main objective is to improve knowledge and skills of medical physicists on machine learning.•Encountered pitfalls: violation of independence assumption, computation errors, data imbalance.•Providing both cross-validation and an independent test helps to detect implementation issue.•The exclusion of non-robust features does not allow to significantly increase model stability. A novel and unconventional approach to a machine learning challenge was designed to spread knowledge, identify robust methods and highlight potential pitfalls about machine learning within the Medical Physics community. A public dataset comprising 41 radiomic features and 535 patients was employed to assess the potential of radiomics in distinguishing between primary lung tumors and metastases. Each participant developed two classification models using: (i) all features (base model); (ii) only robust features (robust model). Both models were validated with cross-validation and on unseen data. The population stability index (PSI) was used as diagnostic metric for implementation issues. Performance was compared to reference. Base and robust models were compared in terms of performance and stability (coefficient of variation (CoV) of prediction probabilities). PSI detected potential implementation errors in 70 % of models. The dataset exhibited strong imbalance. The average Gmean (i.e. an appropriate metric for imbalance) among all participants was 0.67 ± 0.01, significantly higher than reference Gmean of 0.50 ± 0.04. Robust models performances were slightly worse than base models (p 
ISSN:1120-1797
1724-191X
1724-191X
DOI:10.1016/j.ejmp.2024.104834