Loading…
Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization
•This study investigates the combination of publicly-available datasets with single-institutional retrospective data to construct radiomic models for loco-regional recurrence (at 2 years) in head and neck cancer and evaluating their generalizability by validating with an independent dataset.•Feature...
Saved in:
Published in: | Physics and imaging in radiation oncology 2023-04, Vol.26, p.100450, Article 100450 |
---|---|
Main Authors: | , , , , , , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •This study investigates the combination of publicly-available datasets with single-institutional retrospective data to construct radiomic models for loco-regional recurrence (at 2 years) in head and neck cancer and evaluating their generalizability by validating with an independent dataset.•Feature selection methods are dependent on the data, as data varies the features also vary.•Different machine learning classifiers handle heterogeneity in the data differently; simple Logistic regression based models cannot handle the heterogeneity.•ComBat Normalization might help harmonize some of the centre effects; we are unsure if inherent biological differences are also harmonized.•Pooling data from different institutions definitely helps improve the prognostic models, how much data is required for a good prognostic model is still unclear.
Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes.
562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models’ performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated.
LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481–0.559) and 0.632 (0.586–0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536–0.590) and 0.662 (0.606–0.690), respectively. Compared to single cohort AUCs (0.562–0.629), SVM models from pooled data performed significantly better at |
---|---|
ISSN: | 2405-6316 2405-6316 |
DOI: | 10.1016/j.phro.2023.100450 |