Loading…

Development and Validation of an Interpretable Machine Learning Prediction Model for Total Pathological Complete Response after Neoadjuvant Chemotherapy in Locally Advanced Breast Cancer: Multicenter Retrospective Analysis

This study aims to develop an interpretable machine learning (ML) model to accurately predict the probability of achieving total pathological complete response (tpCR) in patients with locally advanced breast cancer (LABC) following neoadjuvant chemotherapy (NAC). This multi-center retrospective stud...

Full description

Saved in:
Bibliographic Details
Published in:Journal of Cancer 2024, Vol.15 (15), p.5058-5071
Main Authors: Zhang, Ziran, Cao, Bo, Wu, Jinghua, Feng, Chengtian
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study aims to develop an interpretable machine learning (ML) model to accurately predict the probability of achieving total pathological complete response (tpCR) in patients with locally advanced breast cancer (LABC) following neoadjuvant chemotherapy (NAC). This multi-center retrospective study included pre-NAC clinical pathology data from 698 LABC patients. Post-operative pathological outcomes divided patients into tpCR and non-tpCR groups. Data from 586 patients at Shanghai Ruijin Hospital were randomly assigned to a training set (80%) and a test set (20%). In comparison, data from our hospital's remaining 112 patients were used for external validation. Variable selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis. Predictive models were constructed using six ML algorithms: decision trees, K-nearest neighbors (KNN), support vector machine, light gradient boosting machine, and extreme gradient boosting. Model efficacy was assessed through various metrics, including receiver operating characteristic (ROC) curves, precision-recall (PR) curves, confusion matrices, calibration plots, and decision curve analysis (DCA). The best-performing model was selected by comparing the performance of different algorithms. Moreover, variable relevance was ranked using the SHapley Additive exPlanations (SHAP) technique to improve the interpretability of the model and solve the "black box" problem. A total of 191 patients (32.59%) achieved tpCR following NAC. Through LASSO regression analysis, five variables were identified as predictive factors for model construction, including tumor size, Ki-67, molecular subtype, targeted therapy, and chemotherapy regimen. The KNN model outperformed the other five classifier algorithms, achieving area under the curve (AUC) values of 0.847 (95% CI: 0.809-0.883) in the training set, 0.763 (95% CI: 0.670-0.856) in the test set, and 0.665 (95% CI: 0.555-0.776) in the external validation set. DCA demonstrated that the KNN model yielded the highest net advantage through a wide range of threshold probabilities in both the training and test sets. Furthermore, the analysis of the KNN model utilizing SHAP technology demonstrated that targeted therapy is the most crucial factor in predicting tpCR. An ML prediction model using clinical and pathological data collected before NAC was developed and verified. This model accurately predicted the probability of achieving a tpCR in patients with
ISSN:1837-9664
1837-9664
DOI:10.7150/jca.97190