Loading…

Lung cancer prediction using multi-gene genetic programming by selecting automatic features from amino acid sequences

Lung cancer is one of the leading causes of cancer related deaths. Early diagnosis of lung cancer using automatic feature selection from large number of features is a challenging task. Conventionally, cancer diagnosis approaches use physical features that appear in later stages, while harmful effect...

Full description

Saved in:
Bibliographic Details
Published in:Computational biology and chemistry 2022-06, Vol.98, p.107638-107638, Article 107638
Main Authors: Sattar, Mohsin, Majid, Abdul, Kausar, Nabeela, Bilal, Muhammad, Kashif, Muhammad
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lung cancer is one of the leading causes of cancer related deaths. Early diagnosis of lung cancer using automatic feature selection from large number of features is a challenging task. Conventionally, cancer diagnosis approaches use physical features that appear in later stages, while harmful effects have already been occurred due to abnormal somatic mutations. In order to extract useful novel patterns to efficiently predict cancer at early stages, we analyzed lung cancer related mutated genes that reveal useful information in protein amino acid sequences. For this, we developed a new evolutionary learning technique with biologically inspired multi-gene genetic programming algorithm using discriminant information of protein amino acids. The proposed model efficiently selects 23 discriminant features out of 1500 features. Then it combines the selected features and related primitive functions optimally for prediction of lung cancer. Hence, an efficient predictive model is constructed that helps in understanding the complex heterogeneous nature of lung cancer. The proposed system achieved area under ROC curve and accuracy values of 98.79% and 95.67%, respectively outperforming related lung cancer prediction approaches. [Display omitted] •Application of machine learning in health care for lung cancer prediction.•Classification of cancer vs. non-cancer protein amino acid sequences using supervised learning.•Transformation of Protein amino acid sequences into various feature spaces.•Automatic feature selection with bio-inspired multi-gene genetic programming.
ISSN:1476-9271
1476-928X
DOI:10.1016/j.compbiolchem.2022.107638