Loading…

cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets

The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic ana...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics 2021-02, Vol.22 (1), p.70-70, Article 70
Main Authors: Crawford, Ryan D, Snitkin, Evan S
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c597t-fce6c6be95b5cf501e90c7c0e367859a4b3fd8f2391115a5de6cfddc6c05b7a93
cites cdi_FETCH-LOGICAL-c597t-fce6c6be95b5cf501e90c7c0e367859a4b3fd8f2391115a5de6cfddc6c05b7a93
container_end_page 70
container_issue 1
container_start_page 70
container_title BMC bioinformatics
container_volume 22
creator Crawford, Ryan D
Snitkin, Evan S
description The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. We illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h. We demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package ( https://github.com/rdcrawford/cognac ) with customizable parameters for adaptation to diverse applications.
doi_str_mv 10.1186/s12859-021-03981-4
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_d015fd73dff04d3dabe8e331920223f2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A653602484</galeid><doaj_id>oai_doaj_org_article_d015fd73dff04d3dabe8e331920223f2</doaj_id><sourcerecordid>A653602484</sourcerecordid><originalsourceid>FETCH-LOGICAL-c597t-fce6c6be95b5cf501e90c7c0e367859a4b3fd8f2391115a5de6cfddc6c05b7a93</originalsourceid><addsrcrecordid>eNptkl2L1DAUhoso7jr6B7yQgDcKdk2apk33QlgWPwYWBD-uw2ly0snQJmPSUfcv-KvNzKzrjkgoKTnPeUPe8xbFU0bPGJPN68QqKbqSVqykvJOsrO8Vp6xuWVkxKu7f-T8pHqW0ppS1koqHxQnnQspW8NPilw6DB31OImycIQN6jDC74EmwRAevYUafv0OJwOgGP6GfE7Ehks3qegy7wuw0cd5iRK-R2BgmMkIc8BXpQc8YHYzkxyqMuJMJE5KE37aZdX4gBmZIOKfHxQMLY8InN_ui-Pru7ZfLD-XVx_fLy4urUouunUursdFNj53ohbaCMuyobjVF3rTZDqh7bo20Fe8YYwKEybg1Rjeair6Fji-K5UHXBFirTXQTxGsVwKn9QYiDgpgfNKIylAlrWm6spbXhBnqUyDnrKlpVPN-xKN4ctDbbfkKjszMRxiPR44p3KzWE76qVUvBaZIEXNwIxZEfSrCaXNI4jeAzbpKq6o6yqJZMZff4Pug7b6LNVe6pjVR7qX2qA_IA8k5Dv1TtRddEI3tAsVmfq7D9UXgYnl8eO1uXzo4aXRw2ZmfHnPMA2JbX8_OmYrQ6sjiGliPbWD0bVLrnqkFyVk6v2yVW7pmd3nbxt-RNV_hs-_ur8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2490912588</pqid></control><display><type>article</type><title>cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets</title><source>Publicly Available Content (ProQuest)</source><source>PubMed Central</source><creator>Crawford, Ryan D ; Snitkin, Evan S</creator><creatorcontrib>Crawford, Ryan D ; Snitkin, Evan S</creatorcontrib><description>The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. We illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h. We demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package ( https://github.com/rdcrawford/cognac ) with customizable parameters for adaptation to diverse applications.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-021-03981-4</identifier><identifier>PMID: 33588753</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Amino acids ; Analysis ; Bacteria ; Bacteria - classification ; Bacteria - genetics ; Bacterial genetics ; Biology ; Concatenated gene tree ; Core genome ; Databases, Genetic ; Datasets ; Family Characteristics ; Gene sequencing ; Genes ; Genome, Bacterial ; Genomes ; Multiple sequence alignment ; Phylogenetics ; Phylogeny ; Prokaryotes ; Software ; Trees ; Whole Genome Sequencing</subject><ispartof>BMC bioinformatics, 2021-02, Vol.22 (1), p.70-70, Article 70</ispartof><rights>COPYRIGHT 2021 BioMed Central Ltd.</rights><rights>2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c597t-fce6c6be95b5cf501e90c7c0e367859a4b3fd8f2391115a5de6cfddc6c05b7a93</citedby><cites>FETCH-LOGICAL-c597t-fce6c6be95b5cf501e90c7c0e367859a4b3fd8f2391115a5de6cfddc6c05b7a93</cites><orcidid>0000-0001-8409-278X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885345/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2490912588?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33588753$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Crawford, Ryan D</creatorcontrib><creatorcontrib>Snitkin, Evan S</creatorcontrib><title>cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. We illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h. We demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package ( https://github.com/rdcrawford/cognac ) with customizable parameters for adaptation to diverse applications.</description><subject>Amino acids</subject><subject>Analysis</subject><subject>Bacteria</subject><subject>Bacteria - classification</subject><subject>Bacteria - genetics</subject><subject>Bacterial genetics</subject><subject>Biology</subject><subject>Concatenated gene tree</subject><subject>Core genome</subject><subject>Databases, Genetic</subject><subject>Datasets</subject><subject>Family Characteristics</subject><subject>Gene sequencing</subject><subject>Genes</subject><subject>Genome, Bacterial</subject><subject>Genomes</subject><subject>Multiple sequence alignment</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Prokaryotes</subject><subject>Software</subject><subject>Trees</subject><subject>Whole Genome Sequencing</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkl2L1DAUhoso7jr6B7yQgDcKdk2apk33QlgWPwYWBD-uw2ly0snQJmPSUfcv-KvNzKzrjkgoKTnPeUPe8xbFU0bPGJPN68QqKbqSVqykvJOsrO8Vp6xuWVkxKu7f-T8pHqW0ppS1koqHxQnnQspW8NPilw6DB31OImycIQN6jDC74EmwRAevYUafv0OJwOgGP6GfE7Ehks3qegy7wuw0cd5iRK-R2BgmMkIc8BXpQc8YHYzkxyqMuJMJE5KE37aZdX4gBmZIOKfHxQMLY8InN_ui-Pru7ZfLD-XVx_fLy4urUouunUursdFNj53ohbaCMuyobjVF3rTZDqh7bo20Fe8YYwKEybg1Rjeair6Fji-K5UHXBFirTXQTxGsVwKn9QYiDgpgfNKIylAlrWm6spbXhBnqUyDnrKlpVPN-xKN4ctDbbfkKjszMRxiPR44p3KzWE76qVUvBaZIEXNwIxZEfSrCaXNI4jeAzbpKq6o6yqJZMZff4Pug7b6LNVe6pjVR7qX2qA_IA8k5Dv1TtRddEI3tAsVmfq7D9UXgYnl8eO1uXzo4aXRw2ZmfHnPMA2JbX8_OmYrQ6sjiGliPbWD0bVLrnqkFyVk6v2yVW7pmd3nbxt-RNV_hs-_ur8</recordid><startdate>20210215</startdate><enddate>20210215</enddate><creator>Crawford, Ryan D</creator><creator>Snitkin, Evan S</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8409-278X</orcidid></search><sort><creationdate>20210215</creationdate><title>cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets</title><author>Crawford, Ryan D ; Snitkin, Evan S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c597t-fce6c6be95b5cf501e90c7c0e367859a4b3fd8f2391115a5de6cfddc6c05b7a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Amino acids</topic><topic>Analysis</topic><topic>Bacteria</topic><topic>Bacteria - classification</topic><topic>Bacteria - genetics</topic><topic>Bacterial genetics</topic><topic>Biology</topic><topic>Concatenated gene tree</topic><topic>Core genome</topic><topic>Databases, Genetic</topic><topic>Datasets</topic><topic>Family Characteristics</topic><topic>Gene sequencing</topic><topic>Genes</topic><topic>Genome, Bacterial</topic><topic>Genomes</topic><topic>Multiple sequence alignment</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Prokaryotes</topic><topic>Software</topic><topic>Trees</topic><topic>Whole Genome Sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Crawford, Ryan D</creatorcontrib><creatorcontrib>Snitkin, Evan S</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health &amp; Medical Collection (Proquest)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Crawford, Ryan D</au><au>Snitkin, Evan S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2021-02-15</date><risdate>2021</risdate><volume>22</volume><issue>1</issue><spage>70</spage><epage>70</epage><pages>70-70</pages><artnum>70</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. We illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h. We demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package ( https://github.com/rdcrawford/cognac ) with customizable parameters for adaptation to diverse applications.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>33588753</pmid><doi>10.1186/s12859-021-03981-4</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-8409-278X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2021-02, Vol.22 (1), p.70-70, Article 70
issn 1471-2105
1471-2105
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_d015fd73dff04d3dabe8e331920223f2
source Publicly Available Content (ProQuest); PubMed Central
subjects Amino acids
Analysis
Bacteria
Bacteria - classification
Bacteria - genetics
Bacterial genetics
Biology
Concatenated gene tree
Core genome
Databases, Genetic
Datasets
Family Characteristics
Gene sequencing
Genes
Genome, Bacterial
Genomes
Multiple sequence alignment
Phylogenetics
Phylogeny
Prokaryotes
Software
Trees
Whole Genome Sequencing
title cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T20%3A30%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=cognac:%20rapid%20generation%20of%20concatenated%20gene%20alignments%20for%20phylogenetic%20inference%20from%20large,%20bacterial%20whole%20genome%20sequencing%20datasets&rft.jtitle=BMC%20bioinformatics&rft.au=Crawford,%20Ryan%20D&rft.date=2021-02-15&rft.volume=22&rft.issue=1&rft.spage=70&rft.epage=70&rft.pages=70-70&rft.artnum=70&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-021-03981-4&rft_dat=%3Cgale_doaj_%3EA653602484%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c597t-fce6c6be95b5cf501e90c7c0e367859a4b3fd8f2391115a5de6cfddc6c05b7a93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2490912588&rft_id=info:pmid/33588753&rft_galeid=A653602484&rfr_iscdi=true