Loading…

MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis

Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or pl...

Full description

Saved in:
Bibliographic Details
Published in:Briefings in bioinformatics 2019-07, Vol.20 (4), p.1151-1159
Main Authors: Meyer, Folker, Bagchi, Saurabh, Chaterji, Somali, Gerlach, Wolfgang, Grama, Ananth, Harrison, Travis, Paczian, Tobias, Trimble, William L, Wilke, Andreas
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093
cites cdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093
container_end_page 1159
container_issue 4
container_start_page 1151
container_title Briefings in bioinformatics
container_volume 20
creator Meyer, Folker
Bagchi, Saurabh
Chaterji, Somali
Gerlach, Wolfgang
Grama, Ananth
Harrison, Travis
Paczian, Tobias
Trimble, William L
Wilke, Andreas
description Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.
doi_str_mv 10.1093/bib/bbx105
format article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6781595</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbx105</oup_id><sourcerecordid>2955228575</sourcerecordid><originalsourceid>FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</originalsourceid><addsrcrecordid>eNp9kdFqFTEQhhex2Fq98QEkIIIIa5Nskt3cCKVoLVQErdch2czupmQ3x2RT7Z0P4RP2SZrDqUW98CpD5uOff-avqmcEvyFYNkfGmSNjfhDMH1QHhLVtzTBnD7e1aGvORLNfPU7pEmOK2448qvapxLTrhDyo3MfT-vPxlwt0BTG5sCB28_OXh5TCkpAHHRewaIhhRhpZ6LUFFAbkw_faZDvCirJfo64nN071OsWQx2mTVzTDqkdYwgxIL9pfJ5eeVHuD9gme3r2H1df37y5OPtTnn07PTo7P6541dK3B2EEKKYm1mAvdado0BDPDdfk3ghpGhZGitVTYhpFWAjGGdtCU3UCUcxxWb3e6m2xmsD0sxaBXm-hmHa9V0E793VncpMZwpUS5DZe8CLy6E4jhW4a0qtmlHrzXC4ScFJGclMGC0YK--Ae9DDmWhZOiknNKO95uBV_vqD6GlCIM92YIVtsEVUlQ7RIs8PM_7d-jvyMrwMsdEPLmf0K3N76l5A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955228575</pqid></control><display><type>article</type><title>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</title><source>Oxford University Press Open Access</source><creator>Meyer, Folker ; Bagchi, Saurabh ; Chaterji, Somali ; Gerlach, Wolfgang ; Grama, Ananth ; Harrison, Travis ; Paczian, Tobias ; Trimble, William L ; Wilke, Andreas</creator><creatorcontrib>Meyer, Folker ; Bagchi, Saurabh ; Chaterji, Somali ; Gerlach, Wolfgang ; Grama, Ananth ; Harrison, Travis ; Paczian, Tobias ; Trimble, William L ; Wilke, Andreas</creatorcontrib><description>Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.</description><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbx105</identifier><identifier>PMID: 29028869</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Benchmarks ; Bioinformatics ; Budgets ; Computational Biology - methods ; Data analysis ; Datasets ; Diamonds ; High-Throughput Nucleotide Sequencing - economics ; High-Throughput Nucleotide Sequencing - methods ; High-Throughput Nucleotide Sequencing - statistics &amp; numerical data ; Internet ; Metagenome ; Metagenomics - economics ; Metagenomics - methods ; Metagenomics - statistics &amp; numerical data ; Sensitivity ; Sequence Analysis, DNA - economics ; Sequence Analysis, DNA - methods ; Sequence Analysis, DNA - statistics &amp; numerical data ; Software ; User-Computer Interface ; Workflow</subject><ispartof>Briefings in bioinformatics, 2019-07, Vol.20 (4), p.1151-1159</ispartof><rights>Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US. 2017</rights><rights>Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</citedby><cites>FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</cites><orcidid>0000-0002-3651-6362</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781595/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781595/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,1604,27924,27925,53791,53793</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bib/bbx105$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29028869$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Meyer, Folker</creatorcontrib><creatorcontrib>Bagchi, Saurabh</creatorcontrib><creatorcontrib>Chaterji, Somali</creatorcontrib><creatorcontrib>Gerlach, Wolfgang</creatorcontrib><creatorcontrib>Grama, Ananth</creatorcontrib><creatorcontrib>Harrison, Travis</creatorcontrib><creatorcontrib>Paczian, Tobias</creatorcontrib><creatorcontrib>Trimble, William L</creatorcontrib><creatorcontrib>Wilke, Andreas</creatorcontrib><title>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.</description><subject>Algorithms</subject><subject>Benchmarks</subject><subject>Bioinformatics</subject><subject>Budgets</subject><subject>Computational Biology - methods</subject><subject>Data analysis</subject><subject>Datasets</subject><subject>Diamonds</subject><subject>High-Throughput Nucleotide Sequencing - economics</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>High-Throughput Nucleotide Sequencing - statistics &amp; numerical data</subject><subject>Internet</subject><subject>Metagenome</subject><subject>Metagenomics - economics</subject><subject>Metagenomics - methods</subject><subject>Metagenomics - statistics &amp; numerical data</subject><subject>Sensitivity</subject><subject>Sequence Analysis, DNA - economics</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Analysis, DNA - statistics &amp; numerical data</subject><subject>Software</subject><subject>User-Computer Interface</subject><subject>Workflow</subject><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kdFqFTEQhhex2Fq98QEkIIIIa5Nskt3cCKVoLVQErdch2czupmQ3x2RT7Z0P4RP2SZrDqUW98CpD5uOff-avqmcEvyFYNkfGmSNjfhDMH1QHhLVtzTBnD7e1aGvORLNfPU7pEmOK2448qvapxLTrhDyo3MfT-vPxlwt0BTG5sCB28_OXh5TCkpAHHRewaIhhRhpZ6LUFFAbkw_faZDvCirJfo64nN071OsWQx2mTVzTDqkdYwgxIL9pfJ5eeVHuD9gme3r2H1df37y5OPtTnn07PTo7P6541dK3B2EEKKYm1mAvdado0BDPDdfk3ghpGhZGitVTYhpFWAjGGdtCU3UCUcxxWb3e6m2xmsD0sxaBXm-hmHa9V0E793VncpMZwpUS5DZe8CLy6E4jhW4a0qtmlHrzXC4ScFJGclMGC0YK--Ae9DDmWhZOiknNKO95uBV_vqD6GlCIM92YIVtsEVUlQ7RIs8PM_7d-jvyMrwMsdEPLmf0K3N76l5A</recordid><startdate>20190719</startdate><enddate>20190719</enddate><creator>Meyer, Folker</creator><creator>Bagchi, Saurabh</creator><creator>Chaterji, Somali</creator><creator>Gerlach, Wolfgang</creator><creator>Grama, Ananth</creator><creator>Harrison, Travis</creator><creator>Paczian, Tobias</creator><creator>Trimble, William L</creator><creator>Wilke, Andreas</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-3651-6362</orcidid></search><sort><creationdate>20190719</creationdate><title>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</title><author>Meyer, Folker ; Bagchi, Saurabh ; Chaterji, Somali ; Gerlach, Wolfgang ; Grama, Ananth ; Harrison, Travis ; Paczian, Tobias ; Trimble, William L ; Wilke, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Benchmarks</topic><topic>Bioinformatics</topic><topic>Budgets</topic><topic>Computational Biology - methods</topic><topic>Data analysis</topic><topic>Datasets</topic><topic>Diamonds</topic><topic>High-Throughput Nucleotide Sequencing - economics</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>High-Throughput Nucleotide Sequencing - statistics &amp; numerical data</topic><topic>Internet</topic><topic>Metagenome</topic><topic>Metagenomics - economics</topic><topic>Metagenomics - methods</topic><topic>Metagenomics - statistics &amp; numerical data</topic><topic>Sensitivity</topic><topic>Sequence Analysis, DNA - economics</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Analysis, DNA - statistics &amp; numerical data</topic><topic>Software</topic><topic>User-Computer Interface</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Meyer, Folker</creatorcontrib><creatorcontrib>Bagchi, Saurabh</creatorcontrib><creatorcontrib>Chaterji, Somali</creatorcontrib><creatorcontrib>Gerlach, Wolfgang</creatorcontrib><creatorcontrib>Grama, Ananth</creatorcontrib><creatorcontrib>Harrison, Travis</creatorcontrib><creatorcontrib>Paczian, Tobias</creatorcontrib><creatorcontrib>Trimble, William L</creatorcontrib><creatorcontrib>Wilke, Andreas</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Meyer, Folker</au><au>Bagchi, Saurabh</au><au>Chaterji, Somali</au><au>Gerlach, Wolfgang</au><au>Grama, Ananth</au><au>Harrison, Travis</au><au>Paczian, Tobias</au><au>Trimble, William L</au><au>Wilke, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2019-07-19</date><risdate>2019</risdate><volume>20</volume><issue>4</issue><spage>1151</spage><epage>1159</epage><pages>1151-1159</pages><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>29028869</pmid><doi>10.1093/bib/bbx105</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-3651-6362</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1467-5463
ispartof Briefings in bioinformatics, 2019-07, Vol.20 (4), p.1151-1159
issn 1467-5463
1477-4054
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6781595
source Oxford University Press Open Access
subjects Algorithms
Benchmarks
Bioinformatics
Budgets
Computational Biology - methods
Data analysis
Datasets
Diamonds
High-Throughput Nucleotide Sequencing - economics
High-Throughput Nucleotide Sequencing - methods
High-Throughput Nucleotide Sequencing - statistics & numerical data
Internet
Metagenome
Metagenomics - economics
Metagenomics - methods
Metagenomics - statistics & numerical data
Sensitivity
Sequence Analysis, DNA - economics
Sequence Analysis, DNA - methods
Sequence Analysis, DNA - statistics & numerical data
Software
User-Computer Interface
Workflow
title MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T20%3A56%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MG-RAST%20version%204%E2%80%94lessons%20learned%20from%20a%20decade%20of%20low-budget%20ultra-high-throughput%20metagenome%20analysis&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Meyer,%20Folker&rft.date=2019-07-19&rft.volume=20&rft.issue=4&rft.spage=1151&rft.epage=1159&rft.pages=1151-1159&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbx105&rft_dat=%3Cproquest_TOX%3E2955228575%3C/proquest_TOX%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2955228575&rft_id=info:pmid/29028869&rft_oup_id=10.1093/bib/bbx105&rfr_iscdi=true