Loading…

The effects of sequencing strategies on Metagenomic pathogen detection using bronchoalveolar lavage fluid samples

Metagenomic next-generation sequencing (mNGS) is a powerful tool for pathogen detection. The accuracy depends on both wet lab and dry lab procedures. The objective of our study was to assess the influence of read length and dataset size on pathogen detection. In this study, 43 clinical BALF samples,...

Full description

Saved in:
Bibliographic Details
Published in:Heliyon 2024-07, Vol.10 (13), p.e33429, Article e33429
Main Authors: Li, Ziyang, Guo, Zhe, Wu, Weimin, Tan, Li, Long, Qichen, Xia, Han, Hu, Min
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Metagenomic next-generation sequencing (mNGS) is a powerful tool for pathogen detection. The accuracy depends on both wet lab and dry lab procedures. The objective of our study was to assess the influence of read length and dataset size on pathogen detection. In this study, 43 clinical BALF samples, which tested positive via clinical mNGS and were consistent with the diagnosis, were subjected to re-sequencing on the Illumina NovaSeq 6000 platform. The raw re-sequencing data, consisting of 100 million (M) paired-end 150 bp (PE150) reads, were divided into simulated datasets with eight different data sizes (5 M, 10 M, 15 M, 20 M, 30 M, 50 M, 75 M, 100 M) and five different read lengths (single-end 50 bp (SE50), SE75, SE100, PE100, and PE150). Both Kraken2 and IDseq bioinformatics pipelines were employed to analyze the previously diagnosed pathogens in the simulated data. Detection of pathogens was based on read counts ranging from 1 to 10 and RPM values ranging from 0.2 to 2. Our results revealed that increasing dataset sizes and read lengths can enhance the performance of mNGS in pathogen detection. However, a larger data sizes for mNGS require higher economic costs and longer turnaround time for data analysis. Our findings indicate 20 M reads being sufficient for SE75 mode to achieve high recall rates. Additionally, high nucleic acid loads in samples can lead to increased stability in pathogen detection efficiency, reducing the impact of sequencing strategies. The choice of bioinformatics pipelines had a significant impact on recall rates achieved in pathogen detection. Increasing dataset sizes and read lengths can enhance the performance of mNGS in pathogen detection but increase the economic and time costs of sequencing and data analysis. Currently, the 20 M reads in SE75 mode may be the best sequencing option. •Using 20M SE75 mode could optimize mNGS detection performence while still being cost-effectiveness.•Samples with adequate pathogenic nucleic acid loads are less affected by sequencing strategies.•Benchmarking of mNGS tools is necessary to achieve accurate and reliable results.
ISSN:2405-8440
2405-8440
DOI:10.1016/j.heliyon.2024.e33429