Loading…

Single-cell type annotation with deep learning in 265 cell types for humans

Abstract Motivation Annotating cell types is a challenging yet essential task in analyzing single-cell RNA sequencing data. However, due to the lack of a gold standard, it is difficult to evaluate the algorithms fairly and an overfitting algorithm may be favored in benchmarks. To address this challe...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics advances 2024, Vol.4 (1), p.vbae054-vbae054
Main Authors: Dong, Sherry, Deng, Kaiwen, Huang, Xiuzhen
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Motivation Annotating cell types is a challenging yet essential task in analyzing single-cell RNA sequencing data. However, due to the lack of a gold standard, it is difficult to evaluate the algorithms fairly and an overfitting algorithm may be favored in benchmarks. To address this challenge, we developed a deep learning-based single-cell type prediction tool that assigns the cell type to 265 different cell types for humans, based on data from approximately five million cells. Results We achieved a median area under the ROC curve (AUC) of 0.93 when evaluated across datasets. We found that inconsistent labeling in the existing database generated by different labs contributed to the mistakes of the model. Therefore, we used cell ontology to correct the annotations and retrained the model, which resulted in 0.971 median AUC. Our study reveals a limiting factor of the accuracy one may achieve with the current database annotation and points to the solutions towards an algorithm-based correction of the gold standard for future automated cell annotation approaches. Availability and implementation The code is available at: https://github.com/SherrySDong/Hierarchical-Correction-Improves-Automated-Single-cell-Type-Annotation. Data used in this study are listed in Supplementary Table S1 and are retrievable at the CZI database.
ISSN:2635-0041
2635-0041
DOI:10.1093/bioadv/vbae054