Loading…

Strategies for cellular deconvolution in human brain RNA sequencing data [version 1; peer review: 1 approved, 1 approved with reservations]

Background: Statistical deconvolution strategies have emerged over the past decade to estimate the proportion of various cell populations in homogenate tissue sources like brain using gene expression data. However, no study has been undertaken to assess the extent to which expression-based and DNAm-...

Full description

Saved in:
Bibliographic Details
Published in:F1000 research 2021, Vol.10, p.750
Main Authors: Sosina, Olukayode A., Tran, Matthew N., Maynard, Kristen R., Tao, Ran, Taub, Margaret A., Martinowich, Keri, Semick, Stephen A., Quach, Bryan C., Weinberger, Daniel R., Hyde, Thomas, Hancock, Dana B., Kleinman, Joel E., Leek, Jeffrey T., Jaffe, Andrew E.
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Statistical deconvolution strategies have emerged over the past decade to estimate the proportion of various cell populations in homogenate tissue sources like brain using gene expression data. However, no study has been undertaken to assess the extent to which expression-based and DNAm-based cell type composition estimates agree. Results: Using estimated neuronal fractions from DNAm data, from the same brain region (i.e., matched) as our bulk RNA-Seq dataset, as proxies for the true unobserved cell-type fractions (i.e., as the gold standard), we assessed the accuracy (RMSE) and concordance (R 2) of four reference-based deconvolution algorithms: Houseman, CIBERSORT, non-negative least squares (NNLS)/MIND, and MuSiC. We did this for two cell-type populations - neurons and non-neurons/glia - using matched single nuclei RNA-Seq and mismatched single cell RNA-Seq reference datasets. With the mismatched single cell RNA-Seq reference dataset, Houseman, MuSiC, and NNLS produced concordant (high correlation; Houseman R 2 = 0.51, 95% CI [0.39, 0.65]; MuSiC R 2 = 0.56, 95% CI [0.43, 0.69]; NNLS R 2 = 0.54, 95% CI [0.32, 0.68]) but biased (high RMSE, >0.35) neuronal fraction estimates. CIBERSORT produced more discordant (moderate correlation; R 2 = 0.25, 95% CI [0.15, 0.38]) neuronal fraction estimates, but with less bias (low RSME, 0.09). Using the matched single nuclei RNA-Seq reference dataset did not eliminate bias (MuSiC RMSE = 0.17). Conclusions: Our results together suggest that many existing RNA deconvolution algorithms estimate the RNA composition of homogenate tissue, e.g. the amount of RNA attributable to each cell type, and not the cellular composition, which relates to the underlying fraction of cells.
ISSN:2046-1402
2046-1402
DOI:10.12688/f1000research.50858.1