An integrative analysis system of gene expression using self-paced learning and SCAD-Net

•A new integrative analysis system for gene-expression value enhancement is proposed.•A novel biomarker selection method and its algorithm are proposed.•Some therapeutic potential markers and pathways for NSCLC are provided. Few proposed gene biomarkers have been satisfactory in clinical application...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2019-11, Vol.135, p.102-112
Main Authors: Huang, Hai-Hui, Liang, Yong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A new integrative analysis system for gene-expression value enhancement is proposed.•A novel biomarker selection method and its algorithm are proposed.•Some therapeutic potential markers and pathways for NSCLC are provided. Few proposed gene biomarkers have been satisfactory in clinical applications. That is mainly due to the small studies sample size. Because of the batch effect, different gene-expression studies cannot be merged directly. Many integrative methods have attempted to integrate various datasets to eliminate the batch effect while keeping biological information intact. However, due to the complexity of the batch effect, it cannot be eliminated, and these methods may even add new systematic errors to the data, further complicating integrated data. Therefore, direct analysis of the merged data may cause some issues. In this paper, we suggest a novel integrative analysis framework for merged gene-expression data. The framework adopts the self-paced learning. This method allows samples to be automatically added into the training period, from simple to intricate, in a purely self-paced way. Moreover, the framework includes a new feature selection method, the SCAD-Net regularization method, a combination of SCAD and network-based penalties to integrates the biological network knowledge. The simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of seven large NSCLC gene expression datasets shows that the proposed method not only results in higher accuracies, but also identifies potential therapeutic markers and pathways in NSCLC. In conclusion, we provide a new and efficient integrative analysis system of gene expression, for the search for new reliable diagnosis or targeted therapy biomarker.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.06.016