Loading…

Unexpected cross-species contamination in genome sequencing projects

The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public datab...

Full description

Saved in:
Bibliographic Details
Published in:PeerJ (San Francisco, CA) CA), 2014-11, Vol.2, p.e675-e675
Main Authors: Merchant, Samier, Wood, Derrick E, Salzberg, Steven L
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c567t-df786bcc5522f350103c274105daa85172e3c39a8d8d5b5645694b1c649c3ae83
cites cdi_FETCH-LOGICAL-c567t-df786bcc5522f350103c274105daa85172e3c39a8d8d5b5645694b1c649c3ae83
container_end_page e675
container_issue
container_start_page e675
container_title PeerJ (San Francisco, CA)
container_volume 2
creator Merchant, Samier
Wood, Derrick E
Salzberg, Steven L
description The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.
doi_str_mv 10.7717/peerj.675
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_de3fa92d5ead4caa8db2598b90c69178</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A543541531</galeid><doaj_id>oai_doaj_org_article_de3fa92d5ead4caa8db2598b90c69178</doaj_id><sourcerecordid>A543541531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c567t-df786bcc5522f350103c274105daa85172e3c39a8d8d5b5645694b1c649c3ae83</originalsourceid><addsrcrecordid>eNptkluL1DAYhoso7jLuhX9ACoLoRcfm3NwIy3paWPDGvQ5p8rWT0iZj0or-e9OZdZkRk4ucnu8NefMWxUtUb4VA4v0eIA5bLtiT4hIjLqqGMPn0ZH5RXKU01Lk1mNcNeV5cYEYxJ0RcFh_vPfzag5nBliaGlKqUVw5SaYKf9eS8nl3wpfNlDz5MUCb4sYA3zvflPoYhl6YXxbNOjwmuHsZNcf_50_ebr9Xdty-3N9d3lWFczJXtRMNbYxjDuCOsRjUxWFBUM6t1w5DAQAyRurGNZS3jlHFJW2Q4lYZoaMimuD3q2qAHtY9u0vG3Ctqpw0aIvdJxdmYEZYF0WmLLQFtqsrxtMZNNK2vDJRKr1oej1n5pJ7AG_Bz1eCZ6fuLdTvXhp6KYkrVtircPAjFkR9KsJpcMjKP2EJakECeCCYwpzujrf9AhLNFnqxSSTNbZC3ZC9To_wPku5HvNKqquGSWMIkZQprb_oXK3MLn8Z9C5vH9W8OakYAd6nHcpjMv6rekcfHcEDzmI0D2agWq1Rk0doqZy1DL76tS9R_JvsMgfRfnNSw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1959035052</pqid></control><display><type>article</type><title>Unexpected cross-species contamination in genome sequencing projects</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Coronavirus Research Database</source><creator>Merchant, Samier ; Wood, Derrick E ; Salzberg, Steven L</creator><creatorcontrib>Merchant, Samier ; Wood, Derrick E ; Salzberg, Steven L</creatorcontrib><description>The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.</description><identifier>ISSN: 2167-8359</identifier><identifier>EISSN: 2167-8359</identifier><identifier>DOI: 10.7717/peerj.675</identifier><identifier>PMID: 25426337</identifier><language>eng</language><publisher>United States: PeerJ. Ltd</publisher><subject>Analysis ; Anopheles ; Bacteria ; Bioinformatics ; Biology ; Bos taurus ; Cattle ; Chromosomes ; Computational Biology ; Computer science ; Contaminants ; Contamination ; Deoxyribonucleic acid ; DNA ; DNA sequencing ; Experiments ; Genome assembly ; Genomes ; Genomics ; Microbiology ; Microbiome ; Neisseria gonorrhoeae ; Nucleotide sequence ; Organisms ; Sequence analysis ; Sheep ; Software ; Studies</subject><ispartof>PeerJ (San Francisco, CA), 2014-11, Vol.2, p.e675-e675</ispartof><rights>COPYRIGHT 2014 PeerJ. Ltd.</rights><rights>2014 Merchant et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2014 Merchant et al. 2014 Merchant et al.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c567t-df786bcc5522f350103c274105daa85172e3c39a8d8d5b5645694b1c649c3ae83</citedby><cites>FETCH-LOGICAL-c567t-df786bcc5522f350103c274105daa85172e3c39a8d8d5b5645694b1c649c3ae83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1959035052?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1959035052?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25732,27903,27904,36991,36992,38495,43874,44569,53770,53772,74159,74873</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25426337$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Merchant, Samier</creatorcontrib><creatorcontrib>Wood, Derrick E</creatorcontrib><creatorcontrib>Salzberg, Steven L</creatorcontrib><title>Unexpected cross-species contamination in genome sequencing projects</title><title>PeerJ (San Francisco, CA)</title><addtitle>PeerJ</addtitle><description>The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.</description><subject>Analysis</subject><subject>Anopheles</subject><subject>Bacteria</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Bos taurus</subject><subject>Cattle</subject><subject>Chromosomes</subject><subject>Computational Biology</subject><subject>Computer science</subject><subject>Contaminants</subject><subject>Contamination</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA sequencing</subject><subject>Experiments</subject><subject>Genome assembly</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Microbiology</subject><subject>Microbiome</subject><subject>Neisseria gonorrhoeae</subject><subject>Nucleotide sequence</subject><subject>Organisms</subject><subject>Sequence analysis</subject><subject>Sheep</subject><subject>Software</subject><subject>Studies</subject><issn>2167-8359</issn><issn>2167-8359</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkluL1DAYhoso7jLuhX9ACoLoRcfm3NwIy3paWPDGvQ5p8rWT0iZj0or-e9OZdZkRk4ucnu8NefMWxUtUb4VA4v0eIA5bLtiT4hIjLqqGMPn0ZH5RXKU01Lk1mNcNeV5cYEYxJ0RcFh_vPfzag5nBliaGlKqUVw5SaYKf9eS8nl3wpfNlDz5MUCb4sYA3zvflPoYhl6YXxbNOjwmuHsZNcf_50_ebr9Xdty-3N9d3lWFczJXtRMNbYxjDuCOsRjUxWFBUM6t1w5DAQAyRurGNZS3jlHFJW2Q4lYZoaMimuD3q2qAHtY9u0vG3Ctqpw0aIvdJxdmYEZYF0WmLLQFtqsrxtMZNNK2vDJRKr1oej1n5pJ7AG_Bz1eCZ6fuLdTvXhp6KYkrVtircPAjFkR9KsJpcMjKP2EJakECeCCYwpzujrf9AhLNFnqxSSTNbZC3ZC9To_wPku5HvNKqquGSWMIkZQprb_oXK3MLn8Z9C5vH9W8OakYAd6nHcpjMv6rekcfHcEDzmI0D2agWq1Rk0doqZy1DL76tS9R_JvsMgfRfnNSw</recordid><startdate>20141120</startdate><enddate>20141120</enddate><creator>Merchant, Samier</creator><creator>Wood, Derrick E</creator><creator>Salzberg, Steven L</creator><general>PeerJ. Ltd</general><general>PeerJ, Inc</general><general>PeerJ Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20141120</creationdate><title>Unexpected cross-species contamination in genome sequencing projects</title><author>Merchant, Samier ; Wood, Derrick E ; Salzberg, Steven L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c567t-df786bcc5522f350103c274105daa85172e3c39a8d8d5b5645694b1c649c3ae83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Analysis</topic><topic>Anopheles</topic><topic>Bacteria</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Bos taurus</topic><topic>Cattle</topic><topic>Chromosomes</topic><topic>Computational Biology</topic><topic>Computer science</topic><topic>Contaminants</topic><topic>Contamination</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA sequencing</topic><topic>Experiments</topic><topic>Genome assembly</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Microbiology</topic><topic>Microbiome</topic><topic>Neisseria gonorrhoeae</topic><topic>Nucleotide sequence</topic><topic>Organisms</topic><topic>Sequence analysis</topic><topic>Sheep</topic><topic>Software</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Merchant, Samier</creatorcontrib><creatorcontrib>Wood, Derrick E</creatorcontrib><creatorcontrib>Salzberg, Steven L</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Biological Science Collection</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>PeerJ (San Francisco, CA)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Merchant, Samier</au><au>Wood, Derrick E</au><au>Salzberg, Steven L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unexpected cross-species contamination in genome sequencing projects</atitle><jtitle>PeerJ (San Francisco, CA)</jtitle><addtitle>PeerJ</addtitle><date>2014-11-20</date><risdate>2014</risdate><volume>2</volume><spage>e675</spage><epage>e675</epage><pages>e675-e675</pages><issn>2167-8359</issn><eissn>2167-8359</eissn><abstract>The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.</abstract><cop>United States</cop><pub>PeerJ. Ltd</pub><pmid>25426337</pmid><doi>10.7717/peerj.675</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2167-8359
ispartof PeerJ (San Francisco, CA), 2014-11, Vol.2, p.e675-e675
issn 2167-8359
2167-8359
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_de3fa92d5ead4caa8db2598b90c69178
source Publicly Available Content Database; PubMed Central; Coronavirus Research Database
subjects Analysis
Anopheles
Bacteria
Bioinformatics
Biology
Bos taurus
Cattle
Chromosomes
Computational Biology
Computer science
Contaminants
Contamination
Deoxyribonucleic acid
DNA
DNA sequencing
Experiments
Genome assembly
Genomes
Genomics
Microbiology
Microbiome
Neisseria gonorrhoeae
Nucleotide sequence
Organisms
Sequence analysis
Sheep
Software
Studies
title Unexpected cross-species contamination in genome sequencing projects
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T19%3A25%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unexpected%20cross-species%20contamination%20in%20genome%20sequencing%20projects&rft.jtitle=PeerJ%20(San%20Francisco,%20CA)&rft.au=Merchant,%20Samier&rft.date=2014-11-20&rft.volume=2&rft.spage=e675&rft.epage=e675&rft.pages=e675-e675&rft.issn=2167-8359&rft.eissn=2167-8359&rft_id=info:doi/10.7717/peerj.675&rft_dat=%3Cgale_doaj_%3EA543541531%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c567t-df786bcc5522f350103c274105daa85172e3c39a8d8d5b5645694b1c649c3ae83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1959035052&rft_id=info:pmid/25426337&rft_galeid=A543541531&rfr_iscdi=true