Loading…

Pseudoalignment for metagenomic read assignment

Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically a...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2017-07, Vol.33 (14), p.2082-2088
Main Authors: Schaeffer, L, Pimentel, H, Bray, N, Melsted, P, Pachter, L
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains. We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects. Pipeline and analysis code can be downloaded from http://github.com/pachterlab/metakallisto. lpachter@math.berkeley.edu.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btx106