Loading…

A comparative study of patent sequence databases

Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group...

Full description

Saved in:
Bibliographic Details
Published in:World patent information 2008-12, Vol.30 (4), p.300-308
Main Authors: Andree, Piet Jan, Harper, Mark F., Nauche, Stéphane, Poolman, Robert A., Shaw, Jo, Swinkels, Joop C., Wycherley, Sally
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c420t-65e90d66a27f22976b462b8c0e735fc3379348d17380cf7d4d42605b4bcb07d73
cites cdi_FETCH-LOGICAL-c420t-65e90d66a27f22976b462b8c0e735fc3379348d17380cf7d4d42605b4bcb07d73
container_end_page 308
container_issue 4
container_start_page 300
container_title World patent information
container_volume 30
creator Andree, Piet Jan
Harper, Mark F.
Nauche, Stéphane
Poolman, Robert A.
Shaw, Jo
Swinkels, Joop C.
Wycherley, Sally
description Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group conducted a study to critically compare and evaluate patent sequence databases for data content. A series of sequences were searched to find similar sequences from several well known sources: GENESEQ™, CAS REGISTRY/CAplus SM, PCTGEN, NCBI GenBank ®, EMBL-Bank and the EBI Fasta databases. The study highlights some differences between GENESEQ™ and REGISTRY/CAplus SM results within the context of indexing policy and patent coverage. In comparison to the proprietary databases, the authors have identified important deficiencies in the content of the public databanks. This paper also discusses database timeliness and the choice of algorithm as potential reasons for missing data.
doi_str_mv 10.1016/j.wpi.2008.04.005
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_57723403</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0172219008000471</els_id><sourcerecordid>57723403</sourcerecordid><originalsourceid>FETCH-LOGICAL-c420t-65e90d66a27f22976b462b8c0e735fc3379348d17380cf7d4d42605b4bcb07d73</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEuXxAeyyYpcwfiROxKqqeKoSG5DYWY49Ea6aJthuq_49botYsrgzkmfu1fgQckOhoECru0WxHV3BAOoCRAFQnpAJraXIqwY-T8kEqGQ5ow2ck4sQFgBU1NBMCEwzM_Sj9jq6DWYhru0uG7ps1BFXMQv4vcaVwczqqFsdMFyRs04vA17_9kvy8fjwPnvO529PL7PpPDeCQcyrEhuwVaWZ7BhrZNWKirW1AZS87AznsuGitlTyGkwnrbCCVVC2ojUtSCv5Jbk95o5-SDeEqHoXDC6XeoXDOqhSSsYF8LRIj4vGDyF47NToXa_9TlFQezZqoRIbtWejQKjEJnlejx6PI5o_AyJuB5--rjaKaw6p7JIOTq5dkkgaD7M0TK9fsU9h98cwTDg2Dr0Kxu2hWefRRGUH988pPzdjhFE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>57723403</pqid></control><display><type>article</type><title>A comparative study of patent sequence databases</title><source>Library &amp; Information Science Abstracts (LISA)</source><source>ScienceDirect Freedom Collection</source><creator>Andree, Piet Jan ; Harper, Mark F. ; Nauche, Stéphane ; Poolman, Robert A. ; Shaw, Jo ; Swinkels, Joop C. ; Wycherley, Sally</creator><creatorcontrib>Andree, Piet Jan ; Harper, Mark F. ; Nauche, Stéphane ; Poolman, Robert A. ; Shaw, Jo ; Swinkels, Joop C. ; Wycherley, Sally</creatorcontrib><description>Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group conducted a study to critically compare and evaluate patent sequence databases for data content. A series of sequences were searched to find similar sequences from several well known sources: GENESEQ™, CAS REGISTRY/CAplus SM, PCTGEN, NCBI GenBank ®, EMBL-Bank and the EBI Fasta databases. The study highlights some differences between GENESEQ™ and REGISTRY/CAplus SM results within the context of indexing policy and patent coverage. In comparison to the proprietary databases, the authors have identified important deficiencies in the content of the public databanks. This paper also discusses database timeliness and the choice of algorithm as potential reasons for missing data.</description><identifier>ISSN: 0172-2190</identifier><identifier>EISSN: 1874-690X</identifier><identifier>DOI: 10.1016/j.wpi.2008.04.005</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Biosequences ; CAplus ; EBI Fasta ; EMBL-Bank ; Full text databases ; GenBank ; GENESEQ ; Patent Documentation Group ; Patent sequences ; PCTGEN ; PDG ; REGISTRY ; Searching ; Sequence databases ; Sequence databases Sequence searching Biosequences Patent sequences GENESEQ REGISTRY CAplus PCTGEN GenBank EMBL-Bank EBI Fasta PDG Patent Documentation Group ; Sequence searching</subject><ispartof>World patent information, 2008-12, Vol.30 (4), p.300-308</ispartof><rights>2008 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c420t-65e90d66a27f22976b462b8c0e735fc3379348d17380cf7d4d42605b4bcb07d73</citedby><cites>FETCH-LOGICAL-c420t-65e90d66a27f22976b462b8c0e735fc3379348d17380cf7d4d42605b4bcb07d73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,27905,27906,34117</link.rule.ids><backlink>$$Uhttp://econpapers.repec.org/article/eeeworpat/v_3a30_3ay_3a2008_3ai_3a4_3ap_3a300-308.htm$$DView record in RePEc$$Hfree_for_read</backlink></links><search><creatorcontrib>Andree, Piet Jan</creatorcontrib><creatorcontrib>Harper, Mark F.</creatorcontrib><creatorcontrib>Nauche, Stéphane</creatorcontrib><creatorcontrib>Poolman, Robert A.</creatorcontrib><creatorcontrib>Shaw, Jo</creatorcontrib><creatorcontrib>Swinkels, Joop C.</creatorcontrib><creatorcontrib>Wycherley, Sally</creatorcontrib><title>A comparative study of patent sequence databases</title><title>World patent information</title><description>Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group conducted a study to critically compare and evaluate patent sequence databases for data content. A series of sequences were searched to find similar sequences from several well known sources: GENESEQ™, CAS REGISTRY/CAplus SM, PCTGEN, NCBI GenBank ®, EMBL-Bank and the EBI Fasta databases. The study highlights some differences between GENESEQ™ and REGISTRY/CAplus SM results within the context of indexing policy and patent coverage. In comparison to the proprietary databases, the authors have identified important deficiencies in the content of the public databanks. This paper also discusses database timeliness and the choice of algorithm as potential reasons for missing data.</description><subject>Biosequences</subject><subject>CAplus</subject><subject>EBI Fasta</subject><subject>EMBL-Bank</subject><subject>Full text databases</subject><subject>GenBank</subject><subject>GENESEQ</subject><subject>Patent Documentation Group</subject><subject>Patent sequences</subject><subject>PCTGEN</subject><subject>PDG</subject><subject>REGISTRY</subject><subject>Searching</subject><subject>Sequence databases</subject><subject>Sequence databases Sequence searching Biosequences Patent sequences GENESEQ REGISTRY CAplus PCTGEN GenBank EMBL-Bank EBI Fasta PDG Patent Documentation Group</subject><subject>Sequence searching</subject><issn>0172-2190</issn><issn>1874-690X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp9kMtOwzAQRS0EEuXxAeyyYpcwfiROxKqqeKoSG5DYWY49Ea6aJthuq_49botYsrgzkmfu1fgQckOhoECru0WxHV3BAOoCRAFQnpAJraXIqwY-T8kEqGQ5ow2ck4sQFgBU1NBMCEwzM_Sj9jq6DWYhru0uG7ps1BFXMQv4vcaVwczqqFsdMFyRs04vA17_9kvy8fjwPnvO529PL7PpPDeCQcyrEhuwVaWZ7BhrZNWKirW1AZS87AznsuGitlTyGkwnrbCCVVC2ojUtSCv5Jbk95o5-SDeEqHoXDC6XeoXDOqhSSsYF8LRIj4vGDyF47NToXa_9TlFQezZqoRIbtWejQKjEJnlejx6PI5o_AyJuB5--rjaKaw6p7JIOTq5dkkgaD7M0TK9fsU9h98cwTDg2Dr0Kxu2hWefRRGUH988pPzdjhFE</recordid><startdate>20081201</startdate><enddate>20081201</enddate><creator>Andree, Piet Jan</creator><creator>Harper, Mark F.</creator><creator>Nauche, Stéphane</creator><creator>Poolman, Robert A.</creator><creator>Shaw, Jo</creator><creator>Swinkels, Joop C.</creator><creator>Wycherley, Sally</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>DKI</scope><scope>X2L</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope></search><sort><creationdate>20081201</creationdate><title>A comparative study of patent sequence databases</title><author>Andree, Piet Jan ; Harper, Mark F. ; Nauche, Stéphane ; Poolman, Robert A. ; Shaw, Jo ; Swinkels, Joop C. ; Wycherley, Sally</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c420t-65e90d66a27f22976b462b8c0e735fc3379348d17380cf7d4d42605b4bcb07d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Biosequences</topic><topic>CAplus</topic><topic>EBI Fasta</topic><topic>EMBL-Bank</topic><topic>Full text databases</topic><topic>GenBank</topic><topic>GENESEQ</topic><topic>Patent Documentation Group</topic><topic>Patent sequences</topic><topic>PCTGEN</topic><topic>PDG</topic><topic>REGISTRY</topic><topic>Searching</topic><topic>Sequence databases</topic><topic>Sequence databases Sequence searching Biosequences Patent sequences GENESEQ REGISTRY CAplus PCTGEN GenBank EMBL-Bank EBI Fasta PDG Patent Documentation Group</topic><topic>Sequence searching</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Andree, Piet Jan</creatorcontrib><creatorcontrib>Harper, Mark F.</creatorcontrib><creatorcontrib>Nauche, Stéphane</creatorcontrib><creatorcontrib>Poolman, Robert A.</creatorcontrib><creatorcontrib>Shaw, Jo</creatorcontrib><creatorcontrib>Swinkels, Joop C.</creatorcontrib><creatorcontrib>Wycherley, Sally</creatorcontrib><collection>RePEc IDEAS</collection><collection>RePEc</collection><collection>CrossRef</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><jtitle>World patent information</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Andree, Piet Jan</au><au>Harper, Mark F.</au><au>Nauche, Stéphane</au><au>Poolman, Robert A.</au><au>Shaw, Jo</au><au>Swinkels, Joop C.</au><au>Wycherley, Sally</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A comparative study of patent sequence databases</atitle><jtitle>World patent information</jtitle><date>2008-12-01</date><risdate>2008</risdate><volume>30</volume><issue>4</issue><spage>300</spage><epage>308</epage><pages>300-308</pages><issn>0172-2190</issn><eissn>1874-690X</eissn><abstract>Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group conducted a study to critically compare and evaluate patent sequence databases for data content. A series of sequences were searched to find similar sequences from several well known sources: GENESEQ™, CAS REGISTRY/CAplus SM, PCTGEN, NCBI GenBank ®, EMBL-Bank and the EBI Fasta databases. The study highlights some differences between GENESEQ™ and REGISTRY/CAplus SM results within the context of indexing policy and patent coverage. In comparison to the proprietary databases, the authors have identified important deficiencies in the content of the public databanks. This paper also discusses database timeliness and the choice of algorithm as potential reasons for missing data.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.wpi.2008.04.005</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0172-2190
ispartof World patent information, 2008-12, Vol.30 (4), p.300-308
issn 0172-2190
1874-690X
language eng
recordid cdi_proquest_miscellaneous_57723403
source Library & Information Science Abstracts (LISA); ScienceDirect Freedom Collection
subjects Biosequences
CAplus
EBI Fasta
EMBL-Bank
Full text databases
GenBank
GENESEQ
Patent Documentation Group
Patent sequences
PCTGEN
PDG
REGISTRY
Searching
Sequence databases
Sequence databases Sequence searching Biosequences Patent sequences GENESEQ REGISTRY CAplus PCTGEN GenBank EMBL-Bank EBI Fasta PDG Patent Documentation Group
Sequence searching
title A comparative study of patent sequence databases
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T23%3A40%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20comparative%20study%20of%20patent%20sequence%20databases&rft.jtitle=World%20patent%20information&rft.au=Andree,%20Piet%20Jan&rft.date=2008-12-01&rft.volume=30&rft.issue=4&rft.spage=300&rft.epage=308&rft.pages=300-308&rft.issn=0172-2190&rft.eissn=1874-690X&rft_id=info:doi/10.1016/j.wpi.2008.04.005&rft_dat=%3Cproquest_cross%3E57723403%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c420t-65e90d66a27f22976b462b8c0e735fc3379348d17380cf7d4d42605b4bcb07d73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=57723403&rft_id=info:pmid/&rfr_iscdi=true