Loading…
A Comparison of Machine Learning Approaches for Predicting the Progression of Crohn's Disease
The incidence of Crohn's disease (CD) is rising, which calls for more accurate and less invasive diagnostic tools. The concentration of Faecal Calprotectin (FC) is a reliable indicator of luminal inflammatory processes and can replace invasive and uncomfortable ileocolonoscopies. Studies have c...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The incidence of Crohn's disease (CD) is rising, which calls for more accurate and less invasive diagnostic tools. The concentration of Faecal Calprotectin (FC) is a reliable indicator of luminal inflammatory processes and can replace invasive and uncomfortable ileocolonoscopies. Studies have confirmed the association of FC levels with the progression of CD and various machine learning approaches have been used for predicting disease progression. In this study, we aimed to comparatively evaluate the performance of established machine learning approaches, to predict the progression of CD, using a range of variables, including FC levels. Our dataset consisted of records for 804 patients with CD and a FC measurement, from a teaching hospital that cares for secondary and tertiary referred patients. We compared the performance of four machine learning approaches, namely logistic regression, support vector machine, random forests and artificial neural networks, to predict the likelihood of a flare up. Our results showed that all four approaches performed strongly, which demonstrates the potential of these approaches, in particular logistic regression, for predicting disease progression. Logistic regression slightly outperformed the others, with an accuracy of 0.90 and an AUC of 0.83. Our dataset had missing data for a number of patients, which resulted in fewer variables being selected for inclusion in the model. Our relatively small sample size could account for SVM, Random Forest and the ANN not demonstrating superior accuracy compared to logistic regression, in this study. In future, an increased number of variables should be included for analysis, the outcome period for a flare up should be explored, and our results should be validated using another independent and large dataset. |
---|---|
ISSN: | 2643-2447 |
DOI: | 10.1109/SCOReD50371.2020.9251019 |