Loading…
A database of unique protein sequence identifiers for proteome studies
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is red...
Saved in:
Published in: | Proteomics (Weinheim) 2006-08, Vol.6 (16), p.4514-4522 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123 |
---|---|
cites | cdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123 |
container_end_page | 4522 |
container_issue | 16 |
container_start_page | 4514 |
container_title | Proteomics (Weinheim) |
container_volume | 6 |
creator | Babnigg, György Giometti, Carol S. |
description | In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications. |
doi_str_mv | 10.1002/pmic.200600032 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68836417</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>68836417</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</originalsourceid><addsrcrecordid>eNqFkEtP3TAQhS1EBZSyZVllQ3e59fidJYrKQ6JAEbRLy_FDcsnjYidq-fcNzdWFHauZkb5z5uggdAx4BRiTr-su2hXBWGCMKdlBByCAl5USsLvdOd1HH3P-jTFIVck9tA9CcSUpHKCz08KZ0TQm-2IIxdTHp8kX6zSMPvZF9vPVW19E5_sxhuhTLsKQFmDofJHHyUWfP6EPwbTZH23mIXo4-3ZfX5RXN-eX9elVaRkjpJSGuwawpcExZl3TEK6MUc5VBMsmGCycCDNacccd5V5KwxyviGSOAAdCD9GXxXcOMEfLo-5itr5tTe-HKWuhFBUM5LsgVEyC_O-4WkCbhpyTD3qdYmfSswasXyrWLxXrbcWz4PPGeWo6717xTaczcLIBTLamDcn0NuZXTmGKGeEzVy3cn9j653fe6tvvl_XbEOWijXn0f7dakx61kFRy_ev6XF_8_FHfAdzpmv4DWh-kMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>19471712</pqid></control><display><type>article</type><title>A database of unique protein sequence identifiers for proteome studies</title><source>Wiley</source><creator>Babnigg, György ; Giometti, Carol S.</creator><creatorcontrib>Babnigg, György ; Giometti, Carol S.</creatorcontrib><description>In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.</description><identifier>ISSN: 1615-9853</identifier><identifier>EISSN: 1615-9861</identifier><identifier>DOI: 10.1002/pmic.200600032</identifier><identifier>PMID: 16858731</identifier><language>eng</language><publisher>Weinheim: WILEY-VCH Verlag</publisher><subject>Amino Acid Sequence ; Analytical, structural and metabolic biochemistry ; Animals ; Biological and medical sciences ; Computational Biology ; Databases, Protein ; Fundamental and applied biological sciences. Psychology ; Humans ; Miscellaneous ; Molecular Sequence Data ; Protein sequence identification ; Proteins ; Proteomics ; SEGUID database ; Sequence Analysis, Protein ; Software</subject><ispartof>Proteomics (Weinheim), 2006-08, Vol.6 (16), p.4514-4522</ispartof><rights>Copyright © 2006 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim</rights><rights>2006 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</citedby><cites>FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18030425$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16858731$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Babnigg, György</creatorcontrib><creatorcontrib>Giometti, Carol S.</creatorcontrib><title>A database of unique protein sequence identifiers for proteome studies</title><title>Proteomics (Weinheim)</title><addtitle>Proteomics</addtitle><description>In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.</description><subject>Amino Acid Sequence</subject><subject>Analytical, structural and metabolic biochemistry</subject><subject>Animals</subject><subject>Biological and medical sciences</subject><subject>Computational Biology</subject><subject>Databases, Protein</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Humans</subject><subject>Miscellaneous</subject><subject>Molecular Sequence Data</subject><subject>Protein sequence identification</subject><subject>Proteins</subject><subject>Proteomics</subject><subject>SEGUID database</subject><subject>Sequence Analysis, Protein</subject><subject>Software</subject><issn>1615-9853</issn><issn>1615-9861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNqFkEtP3TAQhS1EBZSyZVllQ3e59fidJYrKQ6JAEbRLy_FDcsnjYidq-fcNzdWFHauZkb5z5uggdAx4BRiTr-su2hXBWGCMKdlBByCAl5USsLvdOd1HH3P-jTFIVck9tA9CcSUpHKCz08KZ0TQm-2IIxdTHp8kX6zSMPvZF9vPVW19E5_sxhuhTLsKQFmDofJHHyUWfP6EPwbTZH23mIXo4-3ZfX5RXN-eX9elVaRkjpJSGuwawpcExZl3TEK6MUc5VBMsmGCycCDNacccd5V5KwxyviGSOAAdCD9GXxXcOMEfLo-5itr5tTe-HKWuhFBUM5LsgVEyC_O-4WkCbhpyTD3qdYmfSswasXyrWLxXrbcWz4PPGeWo6717xTaczcLIBTLamDcn0NuZXTmGKGeEzVy3cn9j653fe6tvvl_XbEOWijXn0f7dakx61kFRy_ev6XF_8_FHfAdzpmv4DWh-kMQ</recordid><startdate>20060801</startdate><enddate>20060801</enddate><creator>Babnigg, György</creator><creator>Giometti, Carol S.</creator><general>WILEY-VCH Verlag</general><general>WILEY‐VCH Verlag</general><general>Wiley-VCH</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20060801</creationdate><title>A database of unique protein sequence identifiers for proteome studies</title><author>Babnigg, György ; Giometti, Carol S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Amino Acid Sequence</topic><topic>Analytical, structural and metabolic biochemistry</topic><topic>Animals</topic><topic>Biological and medical sciences</topic><topic>Computational Biology</topic><topic>Databases, Protein</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Humans</topic><topic>Miscellaneous</topic><topic>Molecular Sequence Data</topic><topic>Protein sequence identification</topic><topic>Proteins</topic><topic>Proteomics</topic><topic>SEGUID database</topic><topic>Sequence Analysis, Protein</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Babnigg, György</creatorcontrib><creatorcontrib>Giometti, Carol S.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Proteomics (Weinheim)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Babnigg, György</au><au>Giometti, Carol S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A database of unique protein sequence identifiers for proteome studies</atitle><jtitle>Proteomics (Weinheim)</jtitle><addtitle>Proteomics</addtitle><date>2006-08-01</date><risdate>2006</risdate><volume>6</volume><issue>16</issue><spage>4514</spage><epage>4522</epage><pages>4514-4522</pages><issn>1615-9853</issn><eissn>1615-9861</eissn><abstract>In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database‐specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2‐DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.</abstract><cop>Weinheim</cop><pub>WILEY-VCH Verlag</pub><pmid>16858731</pmid><doi>10.1002/pmic.200600032</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1615-9853 |
ispartof | Proteomics (Weinheim), 2006-08, Vol.6 (16), p.4514-4522 |
issn | 1615-9853 1615-9861 |
language | eng |
recordid | cdi_proquest_miscellaneous_68836417 |
source | Wiley |
subjects | Amino Acid Sequence Analytical, structural and metabolic biochemistry Animals Biological and medical sciences Computational Biology Databases, Protein Fundamental and applied biological sciences. Psychology Humans Miscellaneous Molecular Sequence Data Protein sequence identification Proteins Proteomics SEGUID database Sequence Analysis, Protein Software |
title | A database of unique protein sequence identifiers for proteome studies |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T17%3A32%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20database%20of%20unique%20protein%20sequence%20identifiers%20for%20proteome%20studies&rft.jtitle=Proteomics%20(Weinheim)&rft.au=Babnigg,%20Gy%C3%B6rgy&rft.date=2006-08-01&rft.volume=6&rft.issue=16&rft.spage=4514&rft.epage=4522&rft.pages=4514-4522&rft.issn=1615-9853&rft.eissn=1615-9861&rft_id=info:doi/10.1002/pmic.200600032&rft_dat=%3Cproquest_cross%3E68836417%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4422-7a5db10c3fd44cdbb258aa8dd9207bfa06d6fc4495d5d35e77a4d59274d215123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=19471712&rft_id=info:pmid/16858731&rfr_iscdi=true |