Loading…

Large-scale sequence comparisons with sourmash [version 1; peer review: 2 approved]

The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between ver...

Full description

Saved in:
Bibliographic Details
Published in:F1000 research 2019, Vol.8, p.1006-1006
Main Authors: Pierce, N. Tessa, Irber, Luiz, Reiter, Taylor, Brooks, Phillip, Brown, C. Titus
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.
ISSN:2046-1402
2046-1402
DOI:10.12688/f1000research.19675.1