Loading…
RNA-seq reproducibility of Pseudomonas aeruginosa in laboratory models of cystic fibrosis
Reproducibility is a fundamental expectation in science and enables investigators to have confidence in their research findings and the ability to compare data from disparate sources, but evaluating reproducibility can be elusive. For example, generating RNA sequencing (RNA-seq) data includes multip...
Saved in:
Published in: | Microbiology spectrum 2024-12, p.e0151324 |
---|---|
Main Authors: | , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Reproducibility is a fundamental expectation in science and enables investigators to have confidence in their research findings and the ability to compare data from disparate sources, but evaluating reproducibility can be elusive. For example, generating RNA sequencing (RNA-seq) data includes multiple steps where variance can be introduced. Thus, it is unclear if RNA-seq data from different sources can be validly compared. While most studies on RNA-seq reproducibility focus on eukaryotes, we evaluate bias in bacteria using
gene expression data from five laboratory models of cystic fibrosis. We leverage a large data set that includes samples prepared in three different laboratories and paired data sets where the same sample was sequenced using at least two different sequencing pipelines. We report here that expression data are highly reproducible across laboratories. In addition, while samples sequenced with different sequencing pipelines showed significantly more variance in expression profiles than between labs, gene expression was still highly reproducible between sequencing pipelines. Further investigation of expression differences between two sequencing pipelines revealed that library preparation methods were the largest source of error, though analyses to identify the source of this variance were inconclusive. Consistent with the reproducibility of expression between sequencing pipelines, we found that different pipelines detected over 80% of the same differentially expressed genes with large expression differences between conditions. Thus, bacterial RNA-seq data from different sources can be validly compared, facilitating the ability to advance understanding of bacterial behavior and physiology using the wide array of publicly available RNA-seq data sets.IMPORTANCERNA sequencing (RNA-seq) has revolutionized biology, but many steps in RNA-seq workflows can introduce variance, potentially compromising reproducibility. While reproducibility in RNA-seq has been thoroughly investigated in eukaryotes, less is known about pipelines and workflows that introduce variance and biases in bacterial RNA-seq data. By leveraging
transcriptomes in cystic fibrosis models from different laboratories and sequenced with different sequencing pipelines, we directly assess sources of bacterial RNA-seq variance. RNA-seq data were highly reproducible, with the largest variance due to sequencing pipelines, specifically library preparation. Different sequencing pipelines detect |
---|---|
ISSN: | 2165-0497 2165-0497 |
DOI: | 10.1128/spectrum.01513-24 |