Loading…

Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization

•This study investigates the combination of publicly-available datasets with single-institutional retrospective data to construct radiomic models for loco-regional recurrence (at 2 years) in head and neck cancer and evaluating their generalizability by validating with an independent dataset.•Feature...

Full description

Saved in:
Bibliographic Details
Published in:Physics and imaging in radiation oncology 2023-04, Vol.26, p.100450, Article 100450
Main Authors: Varghese, Amal Joseph, Gouthamchand, Varsha, Sasidharan, Balu Krishna, Wee, Leonard, Sidhique, Sharief K, Rao, Julia Priyadarshini, Dekker, Andre, Hoebers, Frank, Devakumar, Devadhas, Irodi, Aparna, Balasingh, Timothy Peace, Godson, Henry Finlay, Joel, T, Mathew, Manu, Gunasingam Isiah, Rajesh, Pavamani, Simon Pradeep, Thomas, Hannah Mary T
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•This study investigates the combination of publicly-available datasets with single-institutional retrospective data to construct radiomic models for loco-regional recurrence (at 2 years) in head and neck cancer and evaluating their generalizability by validating with an independent dataset.•Feature selection methods are dependent on the data, as data varies the features also vary.•Different machine learning classifiers handle heterogeneity in the data differently; simple Logistic regression based models cannot handle the heterogeneity.•ComBat Normalization might help harmonize some of the centre effects; we are unsure if inherent biological differences are also harmonized.•Pooling data from different institutions definitely helps improve the prognostic models, how much data is required for a good prognostic model is still unclear. Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes. 562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models’ performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated. LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481–0.559) and 0.632 (0.586–0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536–0.590) and 0.662 (0.606–0.690), respectively. Compared to single cohort AUCs (0.562–0.629), SVM models from pooled data performed significantly better at
ISSN:2405-6316
2405-6316
DOI:10.1016/j.phro.2023.100450