Loading…
Impact of Data Heterogeneity on AI/ML Model Accuracy in Assisting Pneumonia Type Prediction
Pneumonia is the fourth most common cause of mortality, resulting in more than 50,000 deaths in the U.S. alone every year. Cases of this respiratory infection have only been exacerbated by the COVID-19 pandemic as the virus tends to attack airways and gas exchange regions. The diagnosis of COVID-19...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Pneumonia is the fourth most common cause of mortality, resulting in more than 50,000 deaths in the U.S. alone every year. Cases of this respiratory infection have only been exacerbated by the COVID-19 pandemic as the virus tends to attack airways and gas exchange regions. The diagnosis of COVID-19 pneumonia depends on various factors, including the severity as well as the type of the disease, which physicians attempt to determine preliminarily by analyzing chest X-ray scans. With the enormous amounts of X-ray data, one can utilize an automated procedure to identify the defects in scanned images that in conjunction with other clinical diagnostics can lead to the verification of disease presence. Machine learning has emerged as a powerful tool to enable high-accuracy medical diagnostics. In the current work, various neural network algorithms, including the convolutional neural network (CNN), CNN+DenseNet121, CNN+Efficien tNetB7, and CNN+ResNet50 were employed to classify chest X-ray images as one of the following diagnoses: Negative for COVID-19 pneumonia, Mild Atypical COVID-19 pneumonia, Moderate Atypical COVID-19 pneumonia, Severe Atypical COVID-19 pneumonia, Mild Indeterminate COVID-19 pneumonia, Moderate Indeterminate COVID-19 pneumonia, Severe Indeterminate COVID-19 pneumonia, Mild Typical COVID-19 pneumonia, Moderate Typical COVID-19 pneumonia, and Severe Typical COVID-19 pneumonia. The CNN, CNN+DenseNet121, CNN+Efficien tNetB7, and CNN+ResNet50 models achieved training accuracies of 47.62%, 84.08%, 64.08%, and 74.30% and validation accuracies of 42.29%, 50.25%, 53.98%, and 43.28% respectively. Moderate classification performance across all four of the models suggests that data heterogeneity, particularly the presence of ten similar diagnostic scenarios, greatly limits the potential of machine learning in medical diagnostics. Nevertheless, data manipulations and advanced modeling is being studied further to overcome this barrier. |
---|---|
ISSN: | 2834-8249 |
DOI: | 10.1109/IAICT62357.2024.10617531 |