Loading…

CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning

Single cell RNA sequencing (scRNA-seq) allows global transcriptomic profiling at a cellular resolution, thus, identifying underlying cell types and corresponding lineages. Such cell type identification and annotation rely heavily on models that learn by training themselves on a large amount of indiv...

Full description

Saved in:
Bibliographic Details
Published in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023, Vol.53 (2), p.1287-1305
Main Authors: Seal, Dibyendu Bikash, Das, Vivek, De, Rajat K.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Single cell RNA sequencing (scRNA-seq) allows global transcriptomic profiling at a cellular resolution, thus, identifying underlying cell types and corresponding lineages. Such cell type identification and annotation rely heavily on models that learn by training themselves on a large amount of individual cells with accurate, annotated labels. Presently, this task of cell-type annotation is done based on inspection of marker genes from each of the statistically significant groups of cells. This is both challenging and time consuming. In this article, we have proposed a semi-supervised cell-type annotation method, called CASSL, based on Non-negative matrix factorization (NMF) coupled with recursive k-means algorithm. A semi-supervised model is capable of learning labels for a large amount of unlabelled data with the help of a limited amount of labelled data. The effectiveness of CASSL has been demonstrated on eight publicly available human and mice scRNA-seq datasets across varied organs and protocols. It has been able to correctly annotate majority of the unlabelled cells with high accuracy. It has also been evaluated for its correctness of clustering solution, robustness across varying percentage of missing labels, and time taken for execution. When compared with state-of-the-art unsupervised and semi-supervised cell-type annotation methods, CASSL has consistently outperformed others across all metrics for most of the datasets. It has also shown competitive results when compared against state-of-the-art supervised methods.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-03440-4