Loading…

DySC: software for greedy clustering of 16S rRNA reads

Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering a...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics 2012-08, Vol.28 (16), p.2182-2183
Main Authors: ZEJUN ZHENG, KRAMER, Stefan, SCHMIDT, Bertil
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c452t-ae06a9ef37e9e37f5bc1c101759694066b172db88a27bac22261c7247e2c1473
cites cdi_FETCH-LOGICAL-c452t-ae06a9ef37e9e37f5bc1c101759694066b172db88a27bac22261c7247e2c1473
container_end_page 2183
container_issue 16
container_start_page 2182
container_title Bioinformatics
container_volume 28
creator ZEJUN ZHENG
KRAMER, Stefan
SCHMIDT, Bertil
description Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license.
doi_str_mv 10.1093/bioinformatics/bts355
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671554707</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1032895780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-ae06a9ef37e9e37f5bc1c101759694066b172db88a27bac22261c7247e2c1473</originalsourceid><addsrcrecordid>eNqN0U1Lw0AQBuBFFFurP0HZi-AlurOfWW-lfkJRsL2HzWa3RNKk7iZI_70prRVPepo5PDMD8yJ0DuQaiGY3edmUtW_C0rSljTd5G5kQB2gITKqEpwCH-56wATqJ8Z0QIoiQx2hAqWKEMzFE8m49m9zi2Pj20wSH-414EZwr1thWXWxdKOsFbjwGOcPh7WWMgzNFPEVH3lTRne3qCM0f7ueTp2T6-vg8GU8TywVtE-OINNp5ppx2THmRW7BAQAktNSdS5qBokaepoSo3llIqwSrKlaMWuGIjdLVduwrNR-dimy3LaF1Vmdo1XcxAKhCCK_IPyhknlDFB_qaE0VQLlW6o2FIbmhiD89kqlEsT1j3KNjlkv3PItjn0cxe7E12-dMV-6vvxPbjcAROtqXwwtS3jj5OgBdXAvgCxMpKw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1032895780</pqid></control><display><type>article</type><title>DySC: software for greedy clustering of 16S rRNA reads</title><source>Oxford Open</source><source>PubMed Central</source><creator>ZEJUN ZHENG ; KRAMER, Stefan ; SCHMIDT, Bertil</creator><creatorcontrib>ZEJUN ZHENG ; KRAMER, Stefan ; SCHMIDT, Bertil</creatorcontrib><description>Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>EISSN: 1460-2059</identifier><identifier>DOI: 10.1093/bioinformatics/bts355</identifier><identifier>PMID: 22730435</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Bioinformatics ; Biological and medical sciences ; Cadmium ; Cluster Analysis ; Clustering ; Computer programs ; Dynamics ; Fundamental and applied biological sciences. Psychology ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Metagenome ; Microorganisms ; RNA, Ribosomal, 16S - genetics ; Run time (computers) ; Sequence Analysis, RNA - methods ; Software</subject><ispartof>Bioinformatics, 2012-08, Vol.28 (16), p.2182-2183</ispartof><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c452t-ae06a9ef37e9e37f5bc1c101759694066b172db88a27bac22261c7247e2c1473</citedby><cites>FETCH-LOGICAL-c452t-ae06a9ef37e9e37f5bc1c101759694066b172db88a27bac22261c7247e2c1473</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=26195291$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22730435$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>ZEJUN ZHENG</creatorcontrib><creatorcontrib>KRAMER, Stefan</creatorcontrib><creatorcontrib>SCHMIDT, Bertil</creatorcontrib><title>DySC: software for greedy clustering of 16S rRNA reads</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license.</description><subject>Bioinformatics</subject><subject>Biological and medical sciences</subject><subject>Cadmium</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Computer programs</subject><subject>Dynamics</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Metagenome</subject><subject>Microorganisms</subject><subject>RNA, Ribosomal, 16S - genetics</subject><subject>Run time (computers)</subject><subject>Sequence Analysis, RNA - methods</subject><subject>Software</subject><issn>1367-4803</issn><issn>1367-4811</issn><issn>1460-2059</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNqN0U1Lw0AQBuBFFFurP0HZi-AlurOfWW-lfkJRsL2HzWa3RNKk7iZI_70prRVPepo5PDMD8yJ0DuQaiGY3edmUtW_C0rSljTd5G5kQB2gITKqEpwCH-56wATqJ8Z0QIoiQx2hAqWKEMzFE8m49m9zi2Pj20wSH-414EZwr1thWXWxdKOsFbjwGOcPh7WWMgzNFPEVH3lTRne3qCM0f7ueTp2T6-vg8GU8TywVtE-OINNp5ppx2THmRW7BAQAktNSdS5qBokaepoSo3llIqwSrKlaMWuGIjdLVduwrNR-dimy3LaF1Vmdo1XcxAKhCCK_IPyhknlDFB_qaE0VQLlW6o2FIbmhiD89kqlEsT1j3KNjlkv3PItjn0cxe7E12-dMV-6vvxPbjcAROtqXwwtS3jj5OgBdXAvgCxMpKw</recordid><startdate>20120815</startdate><enddate>20120815</enddate><creator>ZEJUN ZHENG</creator><creator>KRAMER, Stefan</creator><creator>SCHMIDT, Bertil</creator><general>Oxford University Press</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7SC</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120815</creationdate><title>DySC: software for greedy clustering of 16S rRNA reads</title><author>ZEJUN ZHENG ; KRAMER, Stefan ; SCHMIDT, Bertil</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-ae06a9ef37e9e37f5bc1c101759694066b172db88a27bac22261c7247e2c1473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Bioinformatics</topic><topic>Biological and medical sciences</topic><topic>Cadmium</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Computer programs</topic><topic>Dynamics</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Metagenome</topic><topic>Microorganisms</topic><topic>RNA, Ribosomal, 16S - genetics</topic><topic>Run time (computers)</topic><topic>Sequence Analysis, RNA - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>ZEJUN ZHENG</creatorcontrib><creatorcontrib>KRAMER, Stefan</creatorcontrib><creatorcontrib>SCHMIDT, Bertil</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>ZEJUN ZHENG</au><au>KRAMER, Stefan</au><au>SCHMIDT, Bertil</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DySC: software for greedy clustering of 16S rRNA reads</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2012-08-15</date><risdate>2012</risdate><volume>28</volume><issue>16</issue><spage>2182</spage><epage>2183</epage><pages>2182-2183</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><eissn>1460-2059</eissn><abstract>Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>22730435</pmid><doi>10.1093/bioinformatics/bts355</doi><tpages>2</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2012-08, Vol.28 (16), p.2182-2183
issn 1367-4803
1367-4811
1460-2059
language eng
recordid cdi_proquest_miscellaneous_1671554707
source Oxford Open; PubMed Central
subjects Bioinformatics
Biological and medical sciences
Cadmium
Cluster Analysis
Clustering
Computer programs
Dynamics
Fundamental and applied biological sciences. Psychology
General aspects
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Metagenome
Microorganisms
RNA, Ribosomal, 16S - genetics
Run time (computers)
Sequence Analysis, RNA - methods
Software
title DySC: software for greedy clustering of 16S rRNA reads
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T00%3A27%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DySC:%20software%20for%20greedy%20clustering%20of%2016S%20rRNA%20reads&rft.jtitle=Bioinformatics&rft.au=ZEJUN%20ZHENG&rft.date=2012-08-15&rft.volume=28&rft.issue=16&rft.spage=2182&rft.epage=2183&rft.pages=2182-2183&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/bts355&rft_dat=%3Cproquest_cross%3E1032895780%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c452t-ae06a9ef37e9e37f5bc1c101759694066b172db88a27bac22261c7247e2c1473%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1032895780&rft_id=info:pmid/22730435&rfr_iscdi=true