Loading…

Application of data mining methods to improve screening for the risk of early gastric cancer

Although gastric cancer is a malignancy with high morbidity and mortality in China, the survival rate of patients with early gastric cancer (EGC) is high after surgical resection. To strengthen diagnosing and screening is the key to improve the survival and life quality of patients with EGC. This st...

Full description

Saved in:
Bibliographic Details
Published in:BMC medical informatics and decision making 2018-12, Vol.18 (Suppl 5), p.121-32, Article 121
Main Authors: Liu, Mi-Mi, Wen, Li, Liu, Yong-Jia, Cai, Qiao, Li, Li-Ting, Cai, Yong-Ming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Although gastric cancer is a malignancy with high morbidity and mortality in China, the survival rate of patients with early gastric cancer (EGC) is high after surgical resection. To strengthen diagnosing and screening is the key to improve the survival and life quality of patients with EGC. This study applied data mining methods to improve screening for the risk of EGC on the basis of noninvasive factors, and displayed important influence factors for the risk of EGC. The dataset was derived from a project of the First Hospital Affiliated Guangdong Pharmaceutical University. A series of questionnaire surveys, serological examinations and endoscopy plus pathology biopsy were conducted in 618 patients with gastric diseases. Their risk of EGC was categorized into low and high risk of EGC by the results of endoscopy plus pathology biopsy. The synthetic minority oversampling technique (SMOTE) was used to solve imbalance categories of the risk of EGC. Four classification models of the risk of EGC was established, including logistic regression (LR) and three data mining algorithms. The three data mining models had higher accuracy than the LR model. Gain curves of the three data mining models were convexes more closer to ideal curves by contrast with that of the LR model. AUC of the three data mining models were larger than that of the LR model as well. The three data mining models predicted the risk of EGC more effectively in comparison with the LR model. Moreover, this study found 16 important influence factors for the risk of EGC, such as occupations, helicobacter pylori infection, drinking hot water and so on. The three data mining models have optimal predictive behaviors over the LR model, therefore can effectively evaluate the risk of EGC and assist clinicians in improving the diagnosis and screening of EGC. Sixteen important influence factors for the risk of EGC were illustrated, which may helpfully assess gastric carcinogenesis, and remind to early prevention and early detection of gastric cancer. This study may also be conducive to clinical researchers in selecting and conducting the optimal predictive models.
ISSN:1472-6947
1472-6947
DOI:10.1186/s12911-018-0689-4