Loading…

Multi-ensemble machine learning framework for omics data integration: A case study using breast cancer samples

Integration of voluminous omics data aids to unravel biological complexities associated with different disease phenotypes. Machine learning (ML) approaches provide insightful techniques for systematic multi-omics data integration. In this study, survival prediction of breast cancer patients was unde...

Full description

Saved in:
Bibliographic Details
Published in:Informatics in medicine unlocked 2024, Vol.47, p.101507, Article 101507
Main Authors: Tembhare, Kunal, Sharma, Tina, Kasibhatla, Sunitha M., Achalere, Archana, Joshi, Rajendra
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Integration of voluminous omics data aids to unravel biological complexities associated with different disease phenotypes. Machine learning (ML) approaches provide insightful techniques for systematic multi-omics data integration. In this study, survival prediction of breast cancer patients was undertaken using omics data of 302 female patients from The Cancer Genome Atlas (TCGA). The data included gene expression, miRNA expression, DNA methylation and copy number variation. Three computational multi-ensemble ML pipelines were tested using Support Vector Machine (SVM), Random Forest (RF) and Partial Least Squares-Discriminant Analysis (PLS-DA) algorithms. To overcome the limitations associated with univariate feature selection criteria, the ML pipelines were built along with latent factors obtained by multivariate dimension reduction method. This facilitated investigation of background genetic networks and identification of potential hub genes. Analysis of the results obtained revealed that SVM with PLS-DA method (integrated with gene expression, DNA methylation, and miRNA expression modalities) was the best-performing model with an Area Under Curve (AUC) of 89% and an accuracy of 83% for survival prediction. This study not only corroborated previously reported breast cancer-specific prognostic biomarkers but also predicted additional potential biomarkers. The work demonstrates the effective use of a multi-ensemble ML model with efficient feature selection methods as a robust protocol for cancer genotype to phenotype correlation. •A multi-omics data integration framework based on supervised machine learning approach.•Multi-ensemble models built using SVM/RF and PLS-DA to predict Breast Cancer survival.•Use of combination of uni- and multivariate feature selection methods.•Identification of larger network of prognostic bio-markers and hub genes.
ISSN:2352-9148
2352-9148
DOI:10.1016/j.imu.2024.101507