Loading…

Automated data extraction and ensemble methods for predictive modeling of breast cancer outcomes after radiation therapy

Purpose The purpose of this study was to compare the effectiveness of ensemble methods (e.g., random forests) and single‐model methods (e.g., logistic regression and decision trees) in predictive modeling of post‐RT treatment failure and adverse events (AEs) for breast cancer patients using automati...

Full description

Saved in:

Bibliographic Details
Published in:	Medical physics (Lancaster) 2019-02, Vol.46 (2), p.1054-1063
Main Authors:	Lindsay, William D., Ahern, Christopher A., Tobias, Jacob S., Berlind, Christopher G., Chinniah, Chidambaram, Gabriel, Peter E., Gee, James C., Simone, Charles B.
Format:	Article
Language:	English
Subjects:	automated data extraction ensemble methods machine learning predictive modeling radiotherapy outcomes
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Purpose The purpose of this study was to compare the effectiveness of ensemble methods (e.g., random forests) and single‐model methods (e.g., logistic regression and decision trees) in predictive modeling of post‐RT treatment failure and adverse events (AEs) for breast cancer patients using automatically extracted EMR data. Methods Data from 1967 consecutive breast radiotherapy (RT) courses at one institution between 2008 and 2015 were automatically extracted from EMRs and oncology information systems using extraction software. Over 230 variables were extracted spanning the following variable segments: patient demographics, medical/surgical history, tumor characteristics, RT treatment history, and AEs tracked using CTCAEv4.0. Treatment failure was extracted algorithmically by searching posttreatment encounters for evidence of local, nodal, or distant failure. Individual models were trained using decision trees, logistic regression, random forests, and boosted decision trees to predict treatment failures and AEs. Models were fit on 75% of the data and evaluated for probability calibration and area under the ROC curve (AUC) on the remaining test set. The impact of each variable segment was assessed by retraining without the segment and measuring change in AUC (ΔAUC). Results All AUC values were statistically significant (P
ISSN:	0094-2405 2473-4209
DOI:	10.1002/mp.13314