Loading…

Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequ...

Full description

Saved in:
Bibliographic Details
Published in:BMC genomics 2009-02, Vol.10 (1), p.61-61, Article 61
Main Authors: Wright, James C, Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J, Grigoriev, Igor V, Baker, Scott E, Beynon, Robert J, Hubbard, Simon J
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-b710t-fe0208e6e7ce509688923b22697bc10291478c38f5b20efad242a142db67e3a83
cites cdi_FETCH-LOGICAL-b710t-fe0208e6e7ce509688923b22697bc10291478c38f5b20efad242a142db67e3a83
container_end_page 61
container_issue 1
container_start_page 61
container_title BMC genomics
container_volume 10
creator Wright, James C
Sugden, Deana
Francis-McIntyre, Sue
Riba-Garcia, Isabel
Gaskell, Simon J
Grigoriev, Igor V
Baker, Scott E
Beynon, Robert J
Hubbard, Simon J
description Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.
doi_str_mv 10.1186/1471-2164-10-61
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_0add31f3c91348efb193fe134f5481b0</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A193915788</galeid><doaj_id>oai_doaj_org_article_0add31f3c91348efb193fe134f5481b0</doaj_id><sourcerecordid>A193915788</sourcerecordid><originalsourceid>FETCH-LOGICAL-b710t-fe0208e6e7ce509688923b22697bc10291478c38f5b20efad242a142db67e3a83</originalsourceid><addsrcrecordid>eNp1kt9r1TAUx4sobk6ffZOiIPjQLT_atHkRrmPqhYHgj1dDmpx0GW1yTXLH_O9N7WWuMMlDDud88m2_55yieInRKcYdO8N1iyuCWV1hVDH8qDi-yzy-Fx8Vz2K8Rgi3HWmeFkeYY05z4bj4eXG7G71N1g3lLvgEfrKq1DLJ0vhQDuD8BKV0zieZrHc51HMWyslrGMsbOVq9VKwrN3EHYbDjuI-lswOE58UTI8cILw73SfHj48X388_V5ZdP2_PNZdW3GKXKACKoAwatggZx1nWc0J4QxtteYUR4dtIp2pmmJwiM1KQmEtdE96wFKjt6UmwXXe3ltdgFO8nwW3hpxd-ED4OQIVk1gkBSa4oNVRzTugPT51YYyLFp6g73KGu9X7R2-34CrcClIMeV6Lri7JUY_I0grM4NJ1ng9SLgY7IiKptAXSnvHKgkMG4ob5oMfVig3vr_fGVdUX4S80DFPFCBkWA4i7w9_Grwv_YQk5hsVDCO0oHfR8EYp5zRuT9vFnCQuQXWGZ811QyLTfbPcdN2M3X6AJWPhrwV3oGxOb968G71IDMJbtMg9zGK7beva_ZsYVXwMQYwd16zl3mZH3D36v4g_vGH7aV_AL_S7fc</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>66939638</pqid></control><display><type>article</type><title>Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger</title><source>Publicly Available Content Database</source><source>PubMed Central (PMC)</source><creator>Wright, James C ; Sugden, Deana ; Francis-McIntyre, Sue ; Riba-Garcia, Isabel ; Gaskell, Simon J ; Grigoriev, Igor V ; Baker, Scott E ; Beynon, Robert J ; Hubbard, Simon J</creator><creatorcontrib>Wright, James C ; Sugden, Deana ; Francis-McIntyre, Sue ; Riba-Garcia, Isabel ; Gaskell, Simon J ; Grigoriev, Igor V ; Baker, Scott E ; Beynon, Robert J ; Hubbard, Simon J ; USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)</creatorcontrib><description>Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.</description><identifier>ISSN: 1471-2164</identifier><identifier>EISSN: 1471-2164</identifier><identifier>DOI: 10.1186/1471-2164-10-61</identifier><identifier>PMID: 19193216</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Amino Acid Sequence ; Aspergillus ; Aspergillus niger - genetics ; Cluster Analysis ; Databases, Protein ; DNA sequencing ; Gene expression ; Genetic aspects ; Genome, Fungal ; Methods ; Models, Genetic ; Molecular Sequence Data ; Nucleotide sequencing ; Proteomics ; Proteomics - methods ; Sequence Alignment ; Tandem Mass Spectrometry</subject><ispartof>BMC genomics, 2009-02, Vol.10 (1), p.61-61, Article 61</ispartof><rights>COPYRIGHT 2009 BioMed Central Ltd.</rights><rights>Copyright © 2009 Wright et al; licensee BioMed Central Ltd. 2009 Wright et al; licensee BioMed Central Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-b710t-fe0208e6e7ce509688923b22697bc10291478c38f5b20efad242a142db67e3a83</citedby><cites>FETCH-LOGICAL-b710t-fe0208e6e7ce509688923b22697bc10291478c38f5b20efad242a142db67e3a83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2644712/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2644712/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,37013,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/19193216$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/1153955$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Wright, James C</creatorcontrib><creatorcontrib>Sugden, Deana</creatorcontrib><creatorcontrib>Francis-McIntyre, Sue</creatorcontrib><creatorcontrib>Riba-Garcia, Isabel</creatorcontrib><creatorcontrib>Gaskell, Simon J</creatorcontrib><creatorcontrib>Grigoriev, Igor V</creatorcontrib><creatorcontrib>Baker, Scott E</creatorcontrib><creatorcontrib>Beynon, Robert J</creatorcontrib><creatorcontrib>Hubbard, Simon J</creatorcontrib><creatorcontrib>USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)</creatorcontrib><title>Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger</title><title>BMC genomics</title><addtitle>BMC Genomics</addtitle><description>Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.</description><subject>Amino Acid Sequence</subject><subject>Aspergillus</subject><subject>Aspergillus niger - genetics</subject><subject>Cluster Analysis</subject><subject>Databases, Protein</subject><subject>DNA sequencing</subject><subject>Gene expression</subject><subject>Genetic aspects</subject><subject>Genome, Fungal</subject><subject>Methods</subject><subject>Models, Genetic</subject><subject>Molecular Sequence Data</subject><subject>Nucleotide sequencing</subject><subject>Proteomics</subject><subject>Proteomics - methods</subject><subject>Sequence Alignment</subject><subject>Tandem Mass Spectrometry</subject><issn>1471-2164</issn><issn>1471-2164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp1kt9r1TAUx4sobk6ffZOiIPjQLT_atHkRrmPqhYHgj1dDmpx0GW1yTXLH_O9N7WWuMMlDDud88m2_55yieInRKcYdO8N1iyuCWV1hVDH8qDi-yzy-Fx8Vz2K8Rgi3HWmeFkeYY05z4bj4eXG7G71N1g3lLvgEfrKq1DLJ0vhQDuD8BKV0zieZrHc51HMWyslrGMsbOVq9VKwrN3EHYbDjuI-lswOE58UTI8cILw73SfHj48X388_V5ZdP2_PNZdW3GKXKACKoAwatggZx1nWc0J4QxtteYUR4dtIp2pmmJwiM1KQmEtdE96wFKjt6UmwXXe3ltdgFO8nwW3hpxd-ED4OQIVk1gkBSa4oNVRzTugPT51YYyLFp6g73KGu9X7R2-34CrcClIMeV6Lri7JUY_I0grM4NJ1ng9SLgY7IiKptAXSnvHKgkMG4ob5oMfVig3vr_fGVdUX4S80DFPFCBkWA4i7w9_Grwv_YQk5hsVDCO0oHfR8EYp5zRuT9vFnCQuQXWGZ811QyLTfbPcdN2M3X6AJWPhrwV3oGxOb968G71IDMJbtMg9zGK7beva_ZsYVXwMQYwd16zl3mZH3D36v4g_vGH7aV_AL_S7fc</recordid><startdate>20090204</startdate><enddate>20090204</enddate><creator>Wright, James C</creator><creator>Sugden, Deana</creator><creator>Francis-McIntyre, Sue</creator><creator>Riba-Garcia, Isabel</creator><creator>Gaskell, Simon J</creator><creator>Grigoriev, Igor V</creator><creator>Baker, Scott E</creator><creator>Beynon, Robert J</creator><creator>Hubbard, Simon J</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>7X8</scope><scope>OTOTI</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20090204</creationdate><title>Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger</title><author>Wright, James C ; Sugden, Deana ; Francis-McIntyre, Sue ; Riba-Garcia, Isabel ; Gaskell, Simon J ; Grigoriev, Igor V ; Baker, Scott E ; Beynon, Robert J ; Hubbard, Simon J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-b710t-fe0208e6e7ce509688923b22697bc10291478c38f5b20efad242a142db67e3a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Amino Acid Sequence</topic><topic>Aspergillus</topic><topic>Aspergillus niger - genetics</topic><topic>Cluster Analysis</topic><topic>Databases, Protein</topic><topic>DNA sequencing</topic><topic>Gene expression</topic><topic>Genetic aspects</topic><topic>Genome, Fungal</topic><topic>Methods</topic><topic>Models, Genetic</topic><topic>Molecular Sequence Data</topic><topic>Nucleotide sequencing</topic><topic>Proteomics</topic><topic>Proteomics - methods</topic><topic>Sequence Alignment</topic><topic>Tandem Mass Spectrometry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wright, James C</creatorcontrib><creatorcontrib>Sugden, Deana</creatorcontrib><creatorcontrib>Francis-McIntyre, Sue</creatorcontrib><creatorcontrib>Riba-Garcia, Isabel</creatorcontrib><creatorcontrib>Gaskell, Simon J</creatorcontrib><creatorcontrib>Grigoriev, Igor V</creatorcontrib><creatorcontrib>Baker, Scott E</creatorcontrib><creatorcontrib>Beynon, Robert J</creatorcontrib><creatorcontrib>Hubbard, Simon J</creatorcontrib><creatorcontrib>USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Science (Gale in Context)</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC genomics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wright, James C</au><au>Sugden, Deana</au><au>Francis-McIntyre, Sue</au><au>Riba-Garcia, Isabel</au><au>Gaskell, Simon J</au><au>Grigoriev, Igor V</au><au>Baker, Scott E</au><au>Beynon, Robert J</au><au>Hubbard, Simon J</au><aucorp>USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger</atitle><jtitle>BMC genomics</jtitle><addtitle>BMC Genomics</addtitle><date>2009-02-04</date><risdate>2009</risdate><volume>10</volume><issue>1</issue><spage>61</spage><epage>61</epage><pages>61-61</pages><artnum>61</artnum><issn>1471-2164</issn><eissn>1471-2164</eissn><abstract>Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>19193216</pmid><doi>10.1186/1471-2164-10-61</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2164
ispartof BMC genomics, 2009-02, Vol.10 (1), p.61-61, Article 61
issn 1471-2164
1471-2164
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_0add31f3c91348efb193fe134f5481b0
source Publicly Available Content Database; PubMed Central (PMC)
subjects Amino Acid Sequence
Aspergillus
Aspergillus niger - genetics
Cluster Analysis
Databases, Protein
DNA sequencing
Gene expression
Genetic aspects
Genome, Fungal
Methods
Models, Genetic
Molecular Sequence Data
Nucleotide sequencing
Proteomics
Proteomics - methods
Sequence Alignment
Tandem Mass Spectrometry
title Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T01%3A07%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20proteomic%20data%20for%20genome%20annotation%20and%20gene%20model%20validation%20in%20Aspergillus%20niger&rft.jtitle=BMC%20genomics&rft.au=Wright,%20James%20C&rft.aucorp=USDOE%20Joint%20Genome%20Institute%20(JGI),%20Berkeley,%20CA%20(United%20States)&rft.date=2009-02-04&rft.volume=10&rft.issue=1&rft.spage=61&rft.epage=61&rft.pages=61-61&rft.artnum=61&rft.issn=1471-2164&rft.eissn=1471-2164&rft_id=info:doi/10.1186/1471-2164-10-61&rft_dat=%3Cgale_doaj_%3EA193915788%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-b710t-fe0208e6e7ce509688923b22697bc10291478c38f5b20efad242a142db67e3a83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=66939638&rft_id=info:pmid/19193216&rft_galeid=A193915788&rfr_iscdi=true