Loading…

Automated pseudogene detection reveals insights into historical gene sharing dynamics in prokaryotes

In recent years it has become apparent that prokaryotic genomes contain large numbers of pseudogenised genes which may provide valuable insights into the recent functional history of an organism. However, pseudogenes are difficult to detectab initioand are not routinely reported by gene prediction t...

Full description

Saved in:
Bibliographic Details
Published in:Access microbiology 2022-05, Vol.4 (5)
Main Authors: Dimonaco, Nicholas J, Aubrey, Wayne, Clare, Amanda, Kenobi, Kim, Creevey, Chris
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue 5
container_start_page
container_title Access microbiology
container_volume 4
creator Dimonaco, Nicholas J
Aubrey, Wayne
Clare, Amanda
Kenobi, Kim
Creevey, Chris
description In recent years it has become apparent that prokaryotic genomes contain large numbers of pseudogenised genes which may provide valuable insights into the recent functional history of an organism. However, pseudogenes are difficult to detectab initioand are not routinely reported by gene prediction tools. We present StORF-R(Stop-ORF-Reporter), a tool that takes as input an annotated genome and returns putative missed genes (functional and/or pseudogenised) from the intergenic regions. We show that this methodology can recover gene-families that the state-of-the-art methods continue to misreport or completely omit. We applied StORF-R to the intergenic regions of2,665E. coligenomes and found on average 244 previously missed pseudogenised genes (with in-frame stop codons) per genome, many of which had high scoring similarity to known Swiss-Prot proteins. Many of these pseudogenised genes form widespread gene families across E. coli strains. To investigate if this phenomenon exists in other taxa we further applied the methodology to 44,048 bacterial genomes representing 8,244 species from Ensembl. This revealed manygene-families spanning multiple species with large (>10,000) numbers of copies of both intact and pseudogenised versions. Many of these families had only previously been reported in a single or few genomes, though we detected many hundred pseudogenised versions with StORF-R, changing our understanding of how widespread these genes truly are. These pseudogenised genes represent a pangenomic ‘graveyard’ which may alter our understanding of the definition of core and accessory genes for many species.
doi_str_mv 10.1099/acmi.ac2021.po0147
format article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1099_acmi_ac2021_po0147</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1099_acmi_ac2021_po0147</sourcerecordid><originalsourceid>FETCH-LOGICAL-c877-5433bf7480ddb63d60fd870a2dbfe62a2c46fed6db2933a0fbbeabb175e0bf423</originalsourceid><addsrcrecordid>eNpNkMtqwzAUREVpoSHND3SlH3B6JTmyvQyhLwh0k73R48pWG1tGUgr5-8ZNF13NLIbDcAh5ZLBm0DRPygx-rQwHztZTAFZWN2TBN0wWNW_g9l-_J6uUPgGA80YywRfEbk85DCqjpVPCkw0djkgtZjTZh5FG_EZ1TNSPyXd9nksOtPcph-iNOtLffepV9GNH7XlUgzfzik4xfKl4DhnTA7lzFwiu_nJJDi_Ph91bsf94fd9t94Wpq6rYlEJoV5U1WKulsBKcrStQ3GqHkituSunQSqt5I4QCpzUqrVm1QdCu5GJJ-BVrYkgpomun6IfLh5ZBO5tqZ1Pt1VR7NSV-ANxcYjY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Automated pseudogene detection reveals insights into historical gene sharing dynamics in prokaryotes</title><source>PubMed Central Free</source><creator>Dimonaco, Nicholas J ; Aubrey, Wayne ; Clare, Amanda ; Kenobi, Kim ; Creevey, Chris</creator><creatorcontrib>Dimonaco, Nicholas J ; Aubrey, Wayne ; Clare, Amanda ; Kenobi, Kim ; Creevey, Chris</creatorcontrib><description>In recent years it has become apparent that prokaryotic genomes contain large numbers of pseudogenised genes which may provide valuable insights into the recent functional history of an organism. However, pseudogenes are difficult to detectab initioand are not routinely reported by gene prediction tools. We present StORF-R(Stop-ORF-Reporter), a tool that takes as input an annotated genome and returns putative missed genes (functional and/or pseudogenised) from the intergenic regions. We show that this methodology can recover gene-families that the state-of-the-art methods continue to misreport or completely omit. We applied StORF-R to the intergenic regions of2,665E. coligenomes and found on average 244 previously missed pseudogenised genes (with in-frame stop codons) per genome, many of which had high scoring similarity to known Swiss-Prot proteins. Many of these pseudogenised genes form widespread gene families across E. coli strains. To investigate if this phenomenon exists in other taxa we further applied the methodology to 44,048 bacterial genomes representing 8,244 species from Ensembl. This revealed manygene-families spanning multiple species with large (&gt;10,000) numbers of copies of both intact and pseudogenised versions. Many of these families had only previously been reported in a single or few genomes, though we detected many hundred pseudogenised versions with StORF-R, changing our understanding of how widespread these genes truly are. These pseudogenised genes represent a pangenomic ‘graveyard’ which may alter our understanding of the definition of core and accessory genes for many species.</description><identifier>ISSN: 2516-8290</identifier><identifier>EISSN: 2516-8290</identifier><identifier>DOI: 10.1099/acmi.ac2021.po0147</identifier><language>eng</language><ispartof>Access microbiology, 2022-05, Vol.4 (5)</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Dimonaco, Nicholas J</creatorcontrib><creatorcontrib>Aubrey, Wayne</creatorcontrib><creatorcontrib>Clare, Amanda</creatorcontrib><creatorcontrib>Kenobi, Kim</creatorcontrib><creatorcontrib>Creevey, Chris</creatorcontrib><title>Automated pseudogene detection reveals insights into historical gene sharing dynamics in prokaryotes</title><title>Access microbiology</title><description>In recent years it has become apparent that prokaryotic genomes contain large numbers of pseudogenised genes which may provide valuable insights into the recent functional history of an organism. However, pseudogenes are difficult to detectab initioand are not routinely reported by gene prediction tools. We present StORF-R(Stop-ORF-Reporter), a tool that takes as input an annotated genome and returns putative missed genes (functional and/or pseudogenised) from the intergenic regions. We show that this methodology can recover gene-families that the state-of-the-art methods continue to misreport or completely omit. We applied StORF-R to the intergenic regions of2,665E. coligenomes and found on average 244 previously missed pseudogenised genes (with in-frame stop codons) per genome, many of which had high scoring similarity to known Swiss-Prot proteins. Many of these pseudogenised genes form widespread gene families across E. coli strains. To investigate if this phenomenon exists in other taxa we further applied the methodology to 44,048 bacterial genomes representing 8,244 species from Ensembl. This revealed manygene-families spanning multiple species with large (&gt;10,000) numbers of copies of both intact and pseudogenised versions. Many of these families had only previously been reported in a single or few genomes, though we detected many hundred pseudogenised versions with StORF-R, changing our understanding of how widespread these genes truly are. These pseudogenised genes represent a pangenomic ‘graveyard’ which may alter our understanding of the definition of core and accessory genes for many species.</description><issn>2516-8290</issn><issn>2516-8290</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpNkMtqwzAUREVpoSHND3SlH3B6JTmyvQyhLwh0k73R48pWG1tGUgr5-8ZNF13NLIbDcAh5ZLBm0DRPygx-rQwHztZTAFZWN2TBN0wWNW_g9l-_J6uUPgGA80YywRfEbk85DCqjpVPCkw0djkgtZjTZh5FG_EZ1TNSPyXd9nksOtPcph-iNOtLffepV9GNH7XlUgzfzik4xfKl4DhnTA7lzFwiu_nJJDi_Ph91bsf94fd9t94Wpq6rYlEJoV5U1WKulsBKcrStQ3GqHkituSunQSqt5I4QCpzUqrVm1QdCu5GJJ-BVrYkgpomun6IfLh5ZBO5tqZ1Pt1VR7NSV-ANxcYjY</recordid><startdate>20220518</startdate><enddate>20220518</enddate><creator>Dimonaco, Nicholas J</creator><creator>Aubrey, Wayne</creator><creator>Clare, Amanda</creator><creator>Kenobi, Kim</creator><creator>Creevey, Chris</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20220518</creationdate><title>Automated pseudogene detection reveals insights into historical gene sharing dynamics in prokaryotes</title><author>Dimonaco, Nicholas J ; Aubrey, Wayne ; Clare, Amanda ; Kenobi, Kim ; Creevey, Chris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c877-5433bf7480ddb63d60fd870a2dbfe62a2c46fed6db2933a0fbbeabb175e0bf423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dimonaco, Nicholas J</creatorcontrib><creatorcontrib>Aubrey, Wayne</creatorcontrib><creatorcontrib>Clare, Amanda</creatorcontrib><creatorcontrib>Kenobi, Kim</creatorcontrib><creatorcontrib>Creevey, Chris</creatorcontrib><collection>CrossRef</collection><jtitle>Access microbiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dimonaco, Nicholas J</au><au>Aubrey, Wayne</au><au>Clare, Amanda</au><au>Kenobi, Kim</au><au>Creevey, Chris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automated pseudogene detection reveals insights into historical gene sharing dynamics in prokaryotes</atitle><jtitle>Access microbiology</jtitle><date>2022-05-18</date><risdate>2022</risdate><volume>4</volume><issue>5</issue><issn>2516-8290</issn><eissn>2516-8290</eissn><abstract>In recent years it has become apparent that prokaryotic genomes contain large numbers of pseudogenised genes which may provide valuable insights into the recent functional history of an organism. However, pseudogenes are difficult to detectab initioand are not routinely reported by gene prediction tools. We present StORF-R(Stop-ORF-Reporter), a tool that takes as input an annotated genome and returns putative missed genes (functional and/or pseudogenised) from the intergenic regions. We show that this methodology can recover gene-families that the state-of-the-art methods continue to misreport or completely omit. We applied StORF-R to the intergenic regions of2,665E. coligenomes and found on average 244 previously missed pseudogenised genes (with in-frame stop codons) per genome, many of which had high scoring similarity to known Swiss-Prot proteins. Many of these pseudogenised genes form widespread gene families across E. coli strains. To investigate if this phenomenon exists in other taxa we further applied the methodology to 44,048 bacterial genomes representing 8,244 species from Ensembl. This revealed manygene-families spanning multiple species with large (&gt;10,000) numbers of copies of both intact and pseudogenised versions. Many of these families had only previously been reported in a single or few genomes, though we detected many hundred pseudogenised versions with StORF-R, changing our understanding of how widespread these genes truly are. These pseudogenised genes represent a pangenomic ‘graveyard’ which may alter our understanding of the definition of core and accessory genes for many species.</abstract><doi>10.1099/acmi.ac2021.po0147</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2516-8290
ispartof Access microbiology, 2022-05, Vol.4 (5)
issn 2516-8290
2516-8290
language eng
recordid cdi_crossref_primary_10_1099_acmi_ac2021_po0147
source PubMed Central Free
title Automated pseudogene detection reveals insights into historical gene sharing dynamics in prokaryotes
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T20%3A19%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automated%20pseudogene%20detection%20reveals%20insights%20into%20historical%20gene%20sharing%20dynamics%20in%20prokaryotes&rft.jtitle=Access%20microbiology&rft.au=Dimonaco,%20Nicholas%20J&rft.date=2022-05-18&rft.volume=4&rft.issue=5&rft.issn=2516-8290&rft.eissn=2516-8290&rft_id=info:doi/10.1099/acmi.ac2021.po0147&rft_dat=%3Ccrossref%3E10_1099_acmi_ac2021_po0147%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c877-5433bf7480ddb63d60fd870a2dbfe62a2c46fed6db2933a0fbbeabb175e0bf423%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true