Loading…
Multi-ensemble machine learning framework for omics data integration: A case study using breast cancer samples
Integration of voluminous omics data aids to unravel biological complexities associated with different disease phenotypes. Machine learning (ML) approaches provide insightful techniques for systematic multi-omics data integration. In this study, survival prediction of breast cancer patients was unde...
Saved in:
Published in: | Informatics in medicine unlocked 2024, Vol.47, p.101507, Article 101507 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Integration of voluminous omics data aids to unravel biological complexities associated with different disease phenotypes. Machine learning (ML) approaches provide insightful techniques for systematic multi-omics data integration. In this study, survival prediction of breast cancer patients was undertaken using omics data of 302 female patients from The Cancer Genome Atlas (TCGA). The data included gene expression, miRNA expression, DNA methylation and copy number variation. Three computational multi-ensemble ML pipelines were tested using Support Vector Machine (SVM), Random Forest (RF) and Partial Least Squares-Discriminant Analysis (PLS-DA) algorithms. To overcome the limitations associated with univariate feature selection criteria, the ML pipelines were built along with latent factors obtained by multivariate dimension reduction method. This facilitated investigation of background genetic networks and identification of potential hub genes. Analysis of the results obtained revealed that SVM with PLS-DA method (integrated with gene expression, DNA methylation, and miRNA expression modalities) was the best-performing model with an Area Under Curve (AUC) of 89% and an accuracy of 83% for survival prediction. This study not only corroborated previously reported breast cancer-specific prognostic biomarkers but also predicted additional potential biomarkers. The work demonstrates the effective use of a multi-ensemble ML model with efficient feature selection methods as a robust protocol for cancer genotype to phenotype correlation.
•A multi-omics data integration framework based on supervised machine learning approach.•Multi-ensemble models built using SVM/RF and PLS-DA to predict Breast Cancer survival.•Use of combination of uni- and multivariate feature selection methods.•Identification of larger network of prognostic bio-markers and hub genes. |
---|---|
ISSN: | 2352-9148 2352-9148 |
DOI: | 10.1016/j.imu.2024.101507 |