Loading…

scGMM-VGAE: a Gaussian mixture model-based variational graph autoencoder algorithm for clustering single-cell RNA-seq data

Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational...

Full description

Saved in:
Bibliographic Details
Published in:Machine learning: science and technology 2023-09, Vol.4 (3), p.35013
Main Authors: Lin, Eric, Liu, Boyuan, Lac, Leann, Fung, Daryl L X, Leung, Carson K, Hu, Pingzhao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational graph autoencoder on scRNA-seq data (scGMM-VGAE) that integrates a statistical clustering model to a deep learning algorithm to significantly improve the cell clustering performance. This model feeds a cell-cell graph adjacency matrix and a gene feature matrix into a graph variational autoencoder (VGAE) to generate latent data. These data are then used for cell clustering by the Gaussian mixture model (GMM) module. To optimize the algorithm, a designed loss function is derived by combining parameter estimates from the GMM and VGAE. We test the proposed method on four publicly available and three simulated datasets which contain many biological and technical zeros. The scGMM-VGAE outperforms four selected baseline methods on three evaluation metrics in cell clustering. By successfully incorporating GMM into deep learning VGAE on scRNA-seq data, the proposed method shows higher accuracy in cell clustering on scRNA-seq data. This improvement has a significant impact on detecting rare cell types in health research. All source codes used in this study can be found at https://github.com/ericlin1230/scGMM-VGAE .
ISSN:2632-2153
2632-2153
DOI:10.1088/2632-2153/acd7c3