Loading…

Abstract 5367: Deep neural networks using protein-protein network information predict multiple myeloma survival

The modern development of sequencing technologies provides a comprehensive molecular portrait of human cancers. There is a strong need to develop methods to not only improve patient prognosis predictions but also to understand the driving factors for treatment. However, the high-dimension, low-sampl...

Full description

Saved in:
Bibliographic Details
Published in:Cancer research (Chicago, Ill.) Ill.), 2023-04, Vol.83 (7_Supplement), p.5367-5367
Main Authors: Zhu, Jiening, Oh, Jung Hun, Simhal, Anish K., Elkin, Rena, Norton, Larry, Deasy, Joseph O., Tannenbaum, Allen R.
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The modern development of sequencing technologies provides a comprehensive molecular portrait of human cancers. There is a strong need to develop methods to not only improve patient prognosis predictions but also to understand the driving factors for treatment. However, the high-dimension, low-sample size nature of the genomic data poses challenges for typical machine learning algorithms. The systematic understanding of genes with respect to a network (protein-protein interaction (PPI) network) is a way to handle the limit and the nonparametric analysis of geometric properties such as Ollivier-Ricci curvature and associated invariant measure developed by our group have proven to be successful for the prediction of survival in multiple cancers. In this work, we propose a novel supervised deep learning approach combining the aforementioned geometric methods, which benefit from the flexibility provided by deep learning techniques while still preserving much of the interpretability of the geometric analysis. We take advantage of a state-of-the-art graph neural network approach. Sparse connections between layers were inspired by the known biology of the PPI network from the Human Protein Reference Database (HPRD) and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, supplemented with geometric network features which are fed into the network in corresponding layers. The prediction is based on a local-global principle, where highly predictive features are selected from early layers of the network and fed directly to the final layer to produce a multivariable Cox regression. We applied our method to RNA-Seq gene expression data from the CoMMpass study of multiple myeloma (MM). More specifically, 657 patients in the data set were randomly divided into training, validation and set-aside testing sets by a ratio of 6:2:2. We obtained an average C-index 0.66 of the prediction in the testing set from a 10-fold data split. Dichotomizing the testing set by its mean value to define high-risk vs. low-risk yielded a significant p-value of the log-rank test in the set-aside data (p-value =3e-4). We observed that geometric protein network information not only improved the outcome prediction (vs. 6% worse without geometric feature inputs), but was also more robust to fold splitting. From our model, we identified WEE1, CENPE and CENPF as top genes driving survival differences (higher expression of WEE1 increased risk and lower negative curvatu
ISSN:1538-7445
1538-7445
DOI:10.1158/1538-7445.AM2023-5367