Loading…

Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper

Abstract Motivation Identifying variant forms of gene clusters of interest in phylogenetically proximate and distant taxa can help to infer their evolutionary histories and functions. Conserved gene clusters may differ by only a few genes, but these small differences can in turn induce substantial p...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2022-01, Vol.38 (3), p.612-620
Main Authors: McFarland, Alexander G, Kennedy, Nolan W, Mills, Carolyn E, Tullman-Ercek, Danielle, Huttenhower, Curtis, Hartmann, Erica M
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c401t-c2fa568ed737f66f66625d46005ec6e6c070dbb457adef6ce6764b39d116119b3
cites cdi_FETCH-LOGICAL-c401t-c2fa568ed737f66f66625d46005ec6e6c070dbb457adef6ce6764b39d116119b3
container_end_page 620
container_issue 3
container_start_page 612
container_title Bioinformatics (Oxford, England)
container_volume 38
creator McFarland, Alexander G
Kennedy, Nolan W
Mills, Carolyn E
Tullman-Ercek, Danielle
Huttenhower, Curtis
Hartmann, Erica M
description Abstract Motivation Identifying variant forms of gene clusters of interest in phylogenetically proximate and distant taxa can help to infer their evolutionary histories and functions. Conserved gene clusters may differ by only a few genes, but these small differences can in turn induce substantial phenotypes, such as by the formation of pseudogenes or insertions interrupting regulation. Particularly as microbial genomes and metagenomic assemblies become increasingly abundant, unsupervised grouping of similar, but not necessarily identical, gene clusters into consistent bins can provide a population-level understanding of their gene content variation and functional homology. Results We developed GeneGrouper, a command-line tool that uses a density-based clustering method to group gene clusters into bins. GeneGrouper demonstrated high recall and precision in benchmarks for the detection of the 23-gene Salmonella enterica LT2 Pdu gene cluster and four-gene Pseudomonas aeruginosa PAO1 Mex gene cluster among 435 genomes spanning mixed taxa. In a subsequent application investigating the diversity and impact of gene-complete and -incomplete LT2 Pdu gene clusters in 1130 S.enterica genomes, GeneGrouper identified a novel, frequently occurring pduN pseudogene. When investigated in vivo, introduction of the pduN pseudogene negatively impacted microcompartment formation. We next demonstrated the versatility of GeneGrouper by clustering distant homologous gene clusters and variable gene clusters found in integrative and conjugative elements. Availability and implementation GeneGrouper software and code are publicly available at https://pypi.org/project/GeneGrouper/. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btab752
format article
fullrecord <record><control><sourceid>oup_TOX</sourceid><recordid>TN_cdi_crossref_primary_10_1093_bioinformatics_btab752</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btab752</oup_id><sourcerecordid>10.1093/bioinformatics/btab752</sourcerecordid><originalsourceid>FETCH-LOGICAL-c401t-c2fa568ed737f66f66625d46005ec6e6c070dbb457adef6ce6764b39d116119b3</originalsourceid><addsrcrecordid>eNqNUNFKAzEQDKLYWv2Fkh84m1zuNr1HqVqFgi_6fFxymxppk5LkhP69Ka0F34SF2YWZ2d0hZMrZPWeNmCnrrTM-bLtkdZyp1ClZlxdkzAXIoppzfnnumRiRmxi_GGM1q-GajEQlRdXAfEzUI7po075QXcSeKuucdWvqDV2jQ6o3Q0wYIk2e5n0YqBmcTtY76gPFb78ZDkMX9vTTxuQzDvFgsMzqZfDDDsMtuTLdJuLdCSfk4_npffFSrN6Wr4uHVaErxlOhS9PVMMdeCmkAckFZ9xXko1EDgmaS9UpVtex6NKARJFRKND3nwHmjxITA0VcHH2NA0-6C3ebTWs7aQ2jt39DaU2hZOD0Kd4PaYn-W_aaUCfxIyP_81_QHTm6Cgg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper</title><source>Open Access: Oxford University Press Open Journals</source><creator>McFarland, Alexander G ; Kennedy, Nolan W ; Mills, Carolyn E ; Tullman-Ercek, Danielle ; Huttenhower, Curtis ; Hartmann, Erica M</creator><contributor>Robinson, Peter</contributor><creatorcontrib>McFarland, Alexander G ; Kennedy, Nolan W ; Mills, Carolyn E ; Tullman-Ercek, Danielle ; Huttenhower, Curtis ; Hartmann, Erica M ; Robinson, Peter</creatorcontrib><description>Abstract Motivation Identifying variant forms of gene clusters of interest in phylogenetically proximate and distant taxa can help to infer their evolutionary histories and functions. Conserved gene clusters may differ by only a few genes, but these small differences can in turn induce substantial phenotypes, such as by the formation of pseudogenes or insertions interrupting regulation. Particularly as microbial genomes and metagenomic assemblies become increasingly abundant, unsupervised grouping of similar, but not necessarily identical, gene clusters into consistent bins can provide a population-level understanding of their gene content variation and functional homology. Results We developed GeneGrouper, a command-line tool that uses a density-based clustering method to group gene clusters into bins. GeneGrouper demonstrated high recall and precision in benchmarks for the detection of the 23-gene Salmonella enterica LT2 Pdu gene cluster and four-gene Pseudomonas aeruginosa PAO1 Mex gene cluster among 435 genomes spanning mixed taxa. In a subsequent application investigating the diversity and impact of gene-complete and -incomplete LT2 Pdu gene clusters in 1130 S.enterica genomes, GeneGrouper identified a novel, frequently occurring pduN pseudogene. When investigated in vivo, introduction of the pduN pseudogene negatively impacted microcompartment formation. We next demonstrated the versatility of GeneGrouper by clustering distant homologous gene clusters and variable gene clusters found in integrative and conjugative elements. Availability and implementation GeneGrouper software and code are publicly available at https://pypi.org/project/GeneGrouper/. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btab752</identifier><identifier>PMID: 34734968</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Genome, Microbial ; Metagenome ; Metagenomics - methods ; Multigene Family ; Software</subject><ispartof>Bioinformatics (Oxford, England), 2022-01, Vol.38 (3), p.612-620</ispartof><rights>The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021</rights><rights>The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c401t-c2fa568ed737f66f66625d46005ec6e6c070dbb457adef6ce6764b39d116119b3</citedby><cites>FETCH-LOGICAL-c401t-c2fa568ed737f66f66625d46005ec6e6c070dbb457adef6ce6764b39d116119b3</cites><orcidid>0000-0002-1803-3623 ; 0000-0002-1110-0096</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1604,27924,27925</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btab752$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34734968$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Robinson, Peter</contributor><creatorcontrib>McFarland, Alexander G</creatorcontrib><creatorcontrib>Kennedy, Nolan W</creatorcontrib><creatorcontrib>Mills, Carolyn E</creatorcontrib><creatorcontrib>Tullman-Ercek, Danielle</creatorcontrib><creatorcontrib>Huttenhower, Curtis</creatorcontrib><creatorcontrib>Hartmann, Erica M</creatorcontrib><title>Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Identifying variant forms of gene clusters of interest in phylogenetically proximate and distant taxa can help to infer their evolutionary histories and functions. Conserved gene clusters may differ by only a few genes, but these small differences can in turn induce substantial phenotypes, such as by the formation of pseudogenes or insertions interrupting regulation. Particularly as microbial genomes and metagenomic assemblies become increasingly abundant, unsupervised grouping of similar, but not necessarily identical, gene clusters into consistent bins can provide a population-level understanding of their gene content variation and functional homology. Results We developed GeneGrouper, a command-line tool that uses a density-based clustering method to group gene clusters into bins. GeneGrouper demonstrated high recall and precision in benchmarks for the detection of the 23-gene Salmonella enterica LT2 Pdu gene cluster and four-gene Pseudomonas aeruginosa PAO1 Mex gene cluster among 435 genomes spanning mixed taxa. In a subsequent application investigating the diversity and impact of gene-complete and -incomplete LT2 Pdu gene clusters in 1130 S.enterica genomes, GeneGrouper identified a novel, frequently occurring pduN pseudogene. When investigated in vivo, introduction of the pduN pseudogene negatively impacted microcompartment formation. We next demonstrated the versatility of GeneGrouper by clustering distant homologous gene clusters and variable gene clusters found in integrative and conjugative elements. Availability and implementation GeneGrouper software and code are publicly available at https://pypi.org/project/GeneGrouper/. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Genome, Microbial</subject><subject>Metagenome</subject><subject>Metagenomics - methods</subject><subject>Multigene Family</subject><subject>Software</subject><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqNUNFKAzEQDKLYWv2Fkh84m1zuNr1HqVqFgi_6fFxymxppk5LkhP69Ka0F34SF2YWZ2d0hZMrZPWeNmCnrrTM-bLtkdZyp1ClZlxdkzAXIoppzfnnumRiRmxi_GGM1q-GajEQlRdXAfEzUI7po075QXcSeKuucdWvqDV2jQ6o3Q0wYIk2e5n0YqBmcTtY76gPFb78ZDkMX9vTTxuQzDvFgsMzqZfDDDsMtuTLdJuLdCSfk4_npffFSrN6Wr4uHVaErxlOhS9PVMMdeCmkAckFZ9xXko1EDgmaS9UpVtex6NKARJFRKND3nwHmjxITA0VcHH2NA0-6C3ebTWs7aQ2jt39DaU2hZOD0Kd4PaYn-W_aaUCfxIyP_81_QHTm6Cgg</recordid><startdate>20220112</startdate><enddate>20220112</enddate><creator>McFarland, Alexander G</creator><creator>Kennedy, Nolan W</creator><creator>Mills, Carolyn E</creator><creator>Tullman-Ercek, Danielle</creator><creator>Huttenhower, Curtis</creator><creator>Hartmann, Erica M</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-1803-3623</orcidid><orcidid>https://orcid.org/0000-0002-1110-0096</orcidid></search><sort><creationdate>20220112</creationdate><title>Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper</title><author>McFarland, Alexander G ; Kennedy, Nolan W ; Mills, Carolyn E ; Tullman-Ercek, Danielle ; Huttenhower, Curtis ; Hartmann, Erica M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c401t-c2fa568ed737f66f66625d46005ec6e6c070dbb457adef6ce6764b39d116119b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Genome, Microbial</topic><topic>Metagenome</topic><topic>Metagenomics - methods</topic><topic>Multigene Family</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>McFarland, Alexander G</creatorcontrib><creatorcontrib>Kennedy, Nolan W</creatorcontrib><creatorcontrib>Mills, Carolyn E</creatorcontrib><creatorcontrib>Tullman-Ercek, Danielle</creatorcontrib><creatorcontrib>Huttenhower, Curtis</creatorcontrib><creatorcontrib>Hartmann, Erica M</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>McFarland, Alexander G</au><au>Kennedy, Nolan W</au><au>Mills, Carolyn E</au><au>Tullman-Ercek, Danielle</au><au>Huttenhower, Curtis</au><au>Hartmann, Erica M</au><au>Robinson, Peter</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2022-01-12</date><risdate>2022</risdate><volume>38</volume><issue>3</issue><spage>612</spage><epage>620</epage><pages>612-620</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>Abstract Motivation Identifying variant forms of gene clusters of interest in phylogenetically proximate and distant taxa can help to infer their evolutionary histories and functions. Conserved gene clusters may differ by only a few genes, but these small differences can in turn induce substantial phenotypes, such as by the formation of pseudogenes or insertions interrupting regulation. Particularly as microbial genomes and metagenomic assemblies become increasingly abundant, unsupervised grouping of similar, but not necessarily identical, gene clusters into consistent bins can provide a population-level understanding of their gene content variation and functional homology. Results We developed GeneGrouper, a command-line tool that uses a density-based clustering method to group gene clusters into bins. GeneGrouper demonstrated high recall and precision in benchmarks for the detection of the 23-gene Salmonella enterica LT2 Pdu gene cluster and four-gene Pseudomonas aeruginosa PAO1 Mex gene cluster among 435 genomes spanning mixed taxa. In a subsequent application investigating the diversity and impact of gene-complete and -incomplete LT2 Pdu gene clusters in 1130 S.enterica genomes, GeneGrouper identified a novel, frequently occurring pduN pseudogene. When investigated in vivo, introduction of the pduN pseudogene negatively impacted microcompartment formation. We next demonstrated the versatility of GeneGrouper by clustering distant homologous gene clusters and variable gene clusters found in integrative and conjugative elements. Availability and implementation GeneGrouper software and code are publicly available at https://pypi.org/project/GeneGrouper/. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>34734968</pmid><doi>10.1093/bioinformatics/btab752</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-1803-3623</orcidid><orcidid>https://orcid.org/0000-0002-1110-0096</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics (Oxford, England), 2022-01, Vol.38 (3), p.612-620
issn 1367-4803
1367-4811
language eng
recordid cdi_crossref_primary_10_1093_bioinformatics_btab752
source Open Access: Oxford University Press Open Journals
subjects Genome, Microbial
Metagenome
Metagenomics - methods
Multigene Family
Software
title Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T22%3A29%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Density-based%20binning%20of%20gene%20clusters%20to%20infer%20function%20or%20evolutionary%20history%20using%20GeneGrouper&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=McFarland,%20Alexander%20G&rft.date=2022-01-12&rft.volume=38&rft.issue=3&rft.spage=612&rft.epage=620&rft.pages=612-620&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btab752&rft_dat=%3Coup_TOX%3E10.1093/bioinformatics/btab752%3C/oup_TOX%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c401t-c2fa568ed737f66f66625d46005ec6e6c070dbb457adef6ce6764b39d116119b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/34734968&rft_oup_id=10.1093/bioinformatics/btab752&rfr_iscdi=true