Loading…
MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis
Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or pl...
Saved in:
Published in: | Briefings in bioinformatics 2019-07, Vol.20 (4), p.1151-1159 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093 |
---|---|
cites | cdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093 |
container_end_page | 1159 |
container_issue | 4 |
container_start_page | 1151 |
container_title | Briefings in bioinformatics |
container_volume | 20 |
creator | Meyer, Folker Bagchi, Saurabh Chaterji, Somali Gerlach, Wolfgang Grama, Ananth Harrison, Travis Paczian, Tobias Trimble, William L Wilke, Andreas |
description | Abstract
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks. |
doi_str_mv | 10.1093/bib/bbx105 |
format | article |
fullrecord | <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6781595</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbx105</oup_id><sourcerecordid>2955228575</sourcerecordid><originalsourceid>FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</originalsourceid><addsrcrecordid>eNp9kdFqFTEQhhex2Fq98QEkIIIIa5Nskt3cCKVoLVQErdch2czupmQ3x2RT7Z0P4RP2SZrDqUW98CpD5uOff-avqmcEvyFYNkfGmSNjfhDMH1QHhLVtzTBnD7e1aGvORLNfPU7pEmOK2448qvapxLTrhDyo3MfT-vPxlwt0BTG5sCB28_OXh5TCkpAHHRewaIhhRhpZ6LUFFAbkw_faZDvCirJfo64nN071OsWQx2mTVzTDqkdYwgxIL9pfJ5eeVHuD9gme3r2H1df37y5OPtTnn07PTo7P6541dK3B2EEKKYm1mAvdado0BDPDdfk3ghpGhZGitVTYhpFWAjGGdtCU3UCUcxxWb3e6m2xmsD0sxaBXm-hmHa9V0E793VncpMZwpUS5DZe8CLy6E4jhW4a0qtmlHrzXC4ScFJGclMGC0YK--Ae9DDmWhZOiknNKO95uBV_vqD6GlCIM92YIVtsEVUlQ7RIs8PM_7d-jvyMrwMsdEPLmf0K3N76l5A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955228575</pqid></control><display><type>article</type><title>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</title><source>Oxford University Press Open Access</source><creator>Meyer, Folker ; Bagchi, Saurabh ; Chaterji, Somali ; Gerlach, Wolfgang ; Grama, Ananth ; Harrison, Travis ; Paczian, Tobias ; Trimble, William L ; Wilke, Andreas</creator><creatorcontrib>Meyer, Folker ; Bagchi, Saurabh ; Chaterji, Somali ; Gerlach, Wolfgang ; Grama, Ananth ; Harrison, Travis ; Paczian, Tobias ; Trimble, William L ; Wilke, Andreas</creatorcontrib><description>Abstract
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.</description><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbx105</identifier><identifier>PMID: 29028869</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Benchmarks ; Bioinformatics ; Budgets ; Computational Biology - methods ; Data analysis ; Datasets ; Diamonds ; High-Throughput Nucleotide Sequencing - economics ; High-Throughput Nucleotide Sequencing - methods ; High-Throughput Nucleotide Sequencing - statistics & numerical data ; Internet ; Metagenome ; Metagenomics - economics ; Metagenomics - methods ; Metagenomics - statistics & numerical data ; Sensitivity ; Sequence Analysis, DNA - economics ; Sequence Analysis, DNA - methods ; Sequence Analysis, DNA - statistics & numerical data ; Software ; User-Computer Interface ; Workflow</subject><ispartof>Briefings in bioinformatics, 2019-07, Vol.20 (4), p.1151-1159</ispartof><rights>Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US. 2017</rights><rights>Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</citedby><cites>FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</cites><orcidid>0000-0002-3651-6362</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781595/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781595/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,1604,27924,27925,53791,53793</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bib/bbx105$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29028869$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Meyer, Folker</creatorcontrib><creatorcontrib>Bagchi, Saurabh</creatorcontrib><creatorcontrib>Chaterji, Somali</creatorcontrib><creatorcontrib>Gerlach, Wolfgang</creatorcontrib><creatorcontrib>Grama, Ananth</creatorcontrib><creatorcontrib>Harrison, Travis</creatorcontrib><creatorcontrib>Paczian, Tobias</creatorcontrib><creatorcontrib>Trimble, William L</creatorcontrib><creatorcontrib>Wilke, Andreas</creatorcontrib><title>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.</description><subject>Algorithms</subject><subject>Benchmarks</subject><subject>Bioinformatics</subject><subject>Budgets</subject><subject>Computational Biology - methods</subject><subject>Data analysis</subject><subject>Datasets</subject><subject>Diamonds</subject><subject>High-Throughput Nucleotide Sequencing - economics</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>High-Throughput Nucleotide Sequencing - statistics & numerical data</subject><subject>Internet</subject><subject>Metagenome</subject><subject>Metagenomics - economics</subject><subject>Metagenomics - methods</subject><subject>Metagenomics - statistics & numerical data</subject><subject>Sensitivity</subject><subject>Sequence Analysis, DNA - economics</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Analysis, DNA - statistics & numerical data</subject><subject>Software</subject><subject>User-Computer Interface</subject><subject>Workflow</subject><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kdFqFTEQhhex2Fq98QEkIIIIa5Nskt3cCKVoLVQErdch2czupmQ3x2RT7Z0P4RP2SZrDqUW98CpD5uOff-avqmcEvyFYNkfGmSNjfhDMH1QHhLVtzTBnD7e1aGvORLNfPU7pEmOK2448qvapxLTrhDyo3MfT-vPxlwt0BTG5sCB28_OXh5TCkpAHHRewaIhhRhpZ6LUFFAbkw_faZDvCirJfo64nN071OsWQx2mTVzTDqkdYwgxIL9pfJ5eeVHuD9gme3r2H1df37y5OPtTnn07PTo7P6541dK3B2EEKKYm1mAvdado0BDPDdfk3ghpGhZGitVTYhpFWAjGGdtCU3UCUcxxWb3e6m2xmsD0sxaBXm-hmHa9V0E793VncpMZwpUS5DZe8CLy6E4jhW4a0qtmlHrzXC4ScFJGclMGC0YK--Ae9DDmWhZOiknNKO95uBV_vqD6GlCIM92YIVtsEVUlQ7RIs8PM_7d-jvyMrwMsdEPLmf0K3N76l5A</recordid><startdate>20190719</startdate><enddate>20190719</enddate><creator>Meyer, Folker</creator><creator>Bagchi, Saurabh</creator><creator>Chaterji, Somali</creator><creator>Gerlach, Wolfgang</creator><creator>Grama, Ananth</creator><creator>Harrison, Travis</creator><creator>Paczian, Tobias</creator><creator>Trimble, William L</creator><creator>Wilke, Andreas</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-3651-6362</orcidid></search><sort><creationdate>20190719</creationdate><title>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</title><author>Meyer, Folker ; Bagchi, Saurabh ; Chaterji, Somali ; Gerlach, Wolfgang ; Grama, Ananth ; Harrison, Travis ; Paczian, Tobias ; Trimble, William L ; Wilke, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Benchmarks</topic><topic>Bioinformatics</topic><topic>Budgets</topic><topic>Computational Biology - methods</topic><topic>Data analysis</topic><topic>Datasets</topic><topic>Diamonds</topic><topic>High-Throughput Nucleotide Sequencing - economics</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>High-Throughput Nucleotide Sequencing - statistics & numerical data</topic><topic>Internet</topic><topic>Metagenome</topic><topic>Metagenomics - economics</topic><topic>Metagenomics - methods</topic><topic>Metagenomics - statistics & numerical data</topic><topic>Sensitivity</topic><topic>Sequence Analysis, DNA - economics</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Analysis, DNA - statistics & numerical data</topic><topic>Software</topic><topic>User-Computer Interface</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Meyer, Folker</creatorcontrib><creatorcontrib>Bagchi, Saurabh</creatorcontrib><creatorcontrib>Chaterji, Somali</creatorcontrib><creatorcontrib>Gerlach, Wolfgang</creatorcontrib><creatorcontrib>Grama, Ananth</creatorcontrib><creatorcontrib>Harrison, Travis</creatorcontrib><creatorcontrib>Paczian, Tobias</creatorcontrib><creatorcontrib>Trimble, William L</creatorcontrib><creatorcontrib>Wilke, Andreas</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Meyer, Folker</au><au>Bagchi, Saurabh</au><au>Chaterji, Somali</au><au>Gerlach, Wolfgang</au><au>Grama, Ananth</au><au>Harrison, Travis</au><au>Paczian, Tobias</au><au>Trimble, William L</au><au>Wilke, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2019-07-19</date><risdate>2019</risdate><volume>20</volume><issue>4</issue><spage>1151</spage><epage>1159</epage><pages>1151-1159</pages><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Abstract
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>29028869</pmid><doi>10.1093/bib/bbx105</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-3651-6362</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1467-5463 |
ispartof | Briefings in bioinformatics, 2019-07, Vol.20 (4), p.1151-1159 |
issn | 1467-5463 1477-4054 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6781595 |
source | Oxford University Press Open Access |
subjects | Algorithms Benchmarks Bioinformatics Budgets Computational Biology - methods Data analysis Datasets Diamonds High-Throughput Nucleotide Sequencing - economics High-Throughput Nucleotide Sequencing - methods High-Throughput Nucleotide Sequencing - statistics & numerical data Internet Metagenome Metagenomics - economics Metagenomics - methods Metagenomics - statistics & numerical data Sensitivity Sequence Analysis, DNA - economics Sequence Analysis, DNA - methods Sequence Analysis, DNA - statistics & numerical data Software User-Computer Interface Workflow |
title | MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T20%3A56%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MG-RAST%20version%204%E2%80%94lessons%20learned%20from%20a%20decade%20of%20low-budget%20ultra-high-throughput%20metagenome%20analysis&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Meyer,%20Folker&rft.date=2019-07-19&rft.volume=20&rft.issue=4&rft.spage=1151&rft.epage=1159&rft.pages=1151-1159&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbx105&rft_dat=%3Cproquest_TOX%3E2955228575%3C/proquest_TOX%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c432t-ebdf96991dd056a8a233104b5adf9b62b426b967d26d34179e1bb28e3207e6093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2955228575&rft_id=info:pmid/29028869&rft_oup_id=10.1093/bib/bbx105&rfr_iscdi=true |