Loading…
Lung cancer prediction using multi-gene genetic programming by selecting automatic features from amino acid sequences
Lung cancer is one of the leading causes of cancer related deaths. Early diagnosis of lung cancer using automatic feature selection from large number of features is a challenging task. Conventionally, cancer diagnosis approaches use physical features that appear in later stages, while harmful effect...
Saved in:
Published in: | Computational biology and chemistry 2022-06, Vol.98, p.107638-107638, Article 107638 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Lung cancer is one of the leading causes of cancer related deaths. Early diagnosis of lung cancer using automatic feature selection from large number of features is a challenging task. Conventionally, cancer diagnosis approaches use physical features that appear in later stages, while harmful effects have already been occurred due to abnormal somatic mutations. In order to extract useful novel patterns to efficiently predict cancer at early stages, we analyzed lung cancer related mutated genes that reveal useful information in protein amino acid sequences. For this, we developed a new evolutionary learning technique with biologically inspired multi-gene genetic programming algorithm using discriminant information of protein amino acids. The proposed model efficiently selects 23 discriminant features out of 1500 features. Then it combines the selected features and related primitive functions optimally for prediction of lung cancer. Hence, an efficient predictive model is constructed that helps in understanding the complex heterogeneous nature of lung cancer. The proposed system achieved area under ROC curve and accuracy values of 98.79% and 95.67%, respectively outperforming related lung cancer prediction approaches.
[Display omitted]
•Application of machine learning in health care for lung cancer prediction.•Classification of cancer vs. non-cancer protein amino acid sequences using supervised learning.•Transformation of Protein amino acid sequences into various feature spaces.•Automatic feature selection with bio-inspired multi-gene genetic programming. |
---|---|
ISSN: | 1476-9271 1476-928X |
DOI: | 10.1016/j.compbiolchem.2022.107638 |