Loading…
Evaluation of a machine-learning model based on laboratory parameters for the prediction of acute leukaemia subtypes: a multicentre model development and validation study in France
Acute leukaemias are life-threatening haematological cancers characterised by the infiltration of transformed immature haematopoietic cells in the blood and bone marrow. Prompt and accurate diagnosis of the three main acute leukaemia subtypes (ie acute lymphocytic leukaemia [ALL], acute myeloid leuk...
Saved in:
Published in: | The Lancet. Digital health 2024-05, Vol.6 (5), p.e323-e333 |
---|---|
Main Authors: | , , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Acute leukaemias are life-threatening haematological cancers characterised by the infiltration of transformed immature haematopoietic cells in the blood and bone marrow. Prompt and accurate diagnosis of the three main acute leukaemia subtypes (ie acute lymphocytic leukaemia [ALL], acute myeloid leukaemia [AML], and acute promyelocytic leukaemia [APL]) is of utmost importance to guide initial treatment and prevent early mortality but requires cytological expertise that is not always available. We aimed to benchmark different machine-learning strategies using a custom variable selection algorithm to propose an extreme gradient boosting model to predict leukaemia subtypes on the basis of routine laboratory parameters.
This multicentre model development and validation study was conducted with data from six independent French university hospital databases. Patients aged 18 years or older diagnosed with AML, APL, or ALL in any one of these six hospital databases between March 1, 2012, and Dec 31, 2021, were recruited. 22 routine parameters were collected at the time of initial disease evaluation; variables with more than 25% of missing values in two datasets were not used for model training, leading to the final inclusion of 19 parameters. The performances of the final model were evaluated on internal testing and external validation sets with area under the receiver operating characteristic curves (AUCs), and clinically relevant cutoffs were chosen to guide clinical decision making. The final tool, Artificial Intelligence Prediction of Acute Leukemia (AI-PAL), was developed from this model.
1410 patients diagnosed with AML, APL, or ALL were included. Data quality control showed few missing values for each cohort, with the exception of uric acid and lactate dehydrogenase for the cohort from Hôpital Cochin. 679 patients from Hôpital Lyon Sud and Centre Hospitalier Universitaire de Clermont-Ferrand were split into the training (n=477) and internal testing (n=202) sets. 731 patients from the four other cohorts were used for external validation. Overall AUCs across all validation cohorts were 0·97 (95% CI 0·95–0·99) for APL, 0·90 (0·83–0·97) for ALL, and 0·89 (0·82–0·95) for AML. Cutoffs were then established on the overall cohort of 1410 patients to guide clinical decisions. Confident cutoffs showed two (0·14%) wrong predictions for ALL, four (0·28%) wrong predictions for APL, and three (0·21%) wrong predictions for AML. Use of the overall cutoff greatly reduced the |
---|---|
ISSN: | 2589-7500 2589-7500 |
DOI: | 10.1016/S2589-7500(24)00044-X |