Loading…

A database of unique protein sequence identifiers for proteome studies

In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is red...

Full description

Saved in:
Bibliographic Details
Published in:Proteomics (Weinheim) 2006-08, Vol.6 (16), p.4514-4522
Main Authors: Babnigg, György, Giometti, Carol S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123
cites cdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123
container_end_page 4522
container_issue 16
container_start_page 4514
container_title Proteomics (Weinheim)
container_volume 6
creator Babnigg, György
Giometti, Carol S.
description In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.
doi_str_mv 10.1002/pmic.200600032
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68836417</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>68836417</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</originalsourceid><addsrcrecordid>eNqFkEtP3TAQhS1EBZSyZVllQ3e59fidJYrKQ6JAEbRLy_FDcsnjYidq-fcNzdWFHauZkb5z5uggdAx4BRiTr-su2hXBWGCMKdlBByCAl5USsLvdOd1HH3P-jTFIVck9tA9CcSUpHKCz08KZ0TQm-2IIxdTHp8kX6zSMPvZF9vPVW19E5_sxhuhTLsKQFmDofJHHyUWfP6EPwbTZH23mIXo4-3ZfX5RXN-eX9elVaRkjpJSGuwawpcExZl3TEK6MUc5VBMsmGCycCDNacccd5V5KwxyviGSOAAdCD9GXxXcOMEfLo-5itr5tTe-HKWuhFBUM5LsgVEyC_O-4WkCbhpyTD3qdYmfSswasXyrWLxXrbcWz4PPGeWo6717xTaczcLIBTLamDcn0NuZXTmGKGeEzVy3cn9j653fe6tvvl_XbEOWijXn0f7dakx61kFRy_ev6XF_8_FHfAdzpmv4DWh-kMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>19471712</pqid></control><display><type>article</type><title>A database of unique protein sequence identifiers for proteome studies</title><source>Wiley</source><creator>Babnigg, György ; Giometti, Carol S.</creator><creatorcontrib>Babnigg, György ; Giometti, Carol S.</creatorcontrib><description>In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.</description><identifier>ISSN: 1615-9853</identifier><identifier>EISSN: 1615-9861</identifier><identifier>DOI: 10.1002/pmic.200600032</identifier><identifier>PMID: 16858731</identifier><language>eng</language><publisher>Weinheim: WILEY-VCH Verlag</publisher><subject>Amino Acid Sequence ; Analytical, structural and metabolic biochemistry ; Animals ; Biological and medical sciences ; Computational Biology ; Databases, Protein ; Fundamental and applied biological sciences. Psychology ; Humans ; Miscellaneous ; Molecular Sequence Data ; Protein sequence identification ; Proteins ; Proteomics ; SEGUID database ; Sequence Analysis, Protein ; Software</subject><ispartof>Proteomics (Weinheim), 2006-08, Vol.6 (16), p.4514-4522</ispartof><rights>Copyright © 2006 WILEY‐VCH Verlag GmbH &amp; Co. KGaA, Weinheim</rights><rights>2006 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</citedby><cites>FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18030425$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16858731$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Babnigg, György</creatorcontrib><creatorcontrib>Giometti, Carol S.</creatorcontrib><title>A database of unique protein sequence identifiers for proteome studies</title><title>Proteomics (Weinheim)</title><addtitle>Proteomics</addtitle><description>In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.</description><subject>Amino Acid Sequence</subject><subject>Analytical, structural and metabolic biochemistry</subject><subject>Animals</subject><subject>Biological and medical sciences</subject><subject>Computational Biology</subject><subject>Databases, Protein</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Humans</subject><subject>Miscellaneous</subject><subject>Molecular Sequence Data</subject><subject>Protein sequence identification</subject><subject>Proteins</subject><subject>Proteomics</subject><subject>SEGUID database</subject><subject>Sequence Analysis, Protein</subject><subject>Software</subject><issn>1615-9853</issn><issn>1615-9861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNqFkEtP3TAQhS1EBZSyZVllQ3e59fidJYrKQ6JAEbRLy_FDcsnjYidq-fcNzdWFHauZkb5z5uggdAx4BRiTr-su2hXBWGCMKdlBByCAl5USsLvdOd1HH3P-jTFIVck9tA9CcSUpHKCz08KZ0TQm-2IIxdTHp8kX6zSMPvZF9vPVW19E5_sxhuhTLsKQFmDofJHHyUWfP6EPwbTZH23mIXo4-3ZfX5RXN-eX9elVaRkjpJSGuwawpcExZl3TEK6MUc5VBMsmGCycCDNacccd5V5KwxyviGSOAAdCD9GXxXcOMEfLo-5itr5tTe-HKWuhFBUM5LsgVEyC_O-4WkCbhpyTD3qdYmfSswasXyrWLxXrbcWz4PPGeWo6717xTaczcLIBTLamDcn0NuZXTmGKGeEzVy3cn9j653fe6tvvl_XbEOWijXn0f7dakx61kFRy_ev6XF_8_FHfAdzpmv4DWh-kMQ</recordid><startdate>20060801</startdate><enddate>20060801</enddate><creator>Babnigg, György</creator><creator>Giometti, Carol S.</creator><general>WILEY-VCH Verlag</general><general>WILEY‐VCH Verlag</general><general>Wiley-VCH</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20060801</creationdate><title>A database of unique protein sequence identifiers for proteome studies</title><author>Babnigg, György ; Giometti, Carol S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Amino Acid Sequence</topic><topic>Analytical, structural and metabolic biochemistry</topic><topic>Animals</topic><topic>Biological and medical sciences</topic><topic>Computational Biology</topic><topic>Databases, Protein</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Humans</topic><topic>Miscellaneous</topic><topic>Molecular Sequence Data</topic><topic>Protein sequence identification</topic><topic>Proteins</topic><topic>Proteomics</topic><topic>SEGUID database</topic><topic>Sequence Analysis, Protein</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Babnigg, György</creatorcontrib><creatorcontrib>Giometti, Carol S.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Proteomics (Weinheim)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Babnigg, György</au><au>Giometti, Carol S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A database of unique protein sequence identifiers for proteome studies</atitle><jtitle>Proteomics (Weinheim)</jtitle><addtitle>Proteomics</addtitle><date>2006-08-01</date><risdate>2006</risdate><volume>6</volume><issue>16</issue><spage>4514</spage><epage>4522</epage><pages>4514-4522</pages><issn>1615-9853</issn><eissn>1615-9861</eissn><abstract>In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.</abstract><cop>Weinheim</cop><pub>WILEY-VCH Verlag</pub><pmid>16858731</pmid><doi>10.1002/pmic.200600032</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1615-9853
ispartof Proteomics (Weinheim), 2006-08, Vol.6 (16), p.4514-4522
issn 1615-9853
1615-9861
language eng
recordid cdi_proquest_miscellaneous_68836417
source Wiley
subjects Amino Acid Sequence
Analytical, structural and metabolic biochemistry
Animals
Biological and medical sciences
Computational Biology
Databases, Protein
Fundamental and applied biological sciences. Psychology
Humans
Miscellaneous
Molecular Sequence Data
Protein sequence identification
Proteins
Proteomics
SEGUID database
Sequence Analysis, Protein
Software
title A database of unique protein sequence identifiers for proteome studies
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T17%3A32%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20database%20of%20unique%20protein%20sequence%20identifiers%20for%20proteome%20studies&rft.jtitle=Proteomics%20(Weinheim)&rft.au=Babnigg,%20Gy%C3%B6rgy&rft.date=2006-08-01&rft.volume=6&rft.issue=16&rft.spage=4514&rft.epage=4522&rft.pages=4514-4522&rft.issn=1615-9853&rft.eissn=1615-9861&rft_id=info:doi/10.1002/pmic.200600032&rft_dat=%3Cproquest_cross%3E68836417%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=19471712&rft_id=info:pmid/16858731&rfr_iscdi=true