Loading…

A comparative study of patent sequence databases

Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group...

Full description

Saved in:
Bibliographic Details
Published in:World patent information 2008-12, Vol.30 (4), p.300-308
Main Authors: Andree, Piet Jan, Harper, Mark F., Nauche, Stéphane, Poolman, Robert A., Shaw, Jo, Swinkels, Joop C., Wycherley, Sally
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Nucleic acid and protein sequence data from patent publications is available from a plurality of commercial and public sources. As the searching and analysis of this data is of crucial importance to the life sciences industry, the Patent Documentation Group’s Biotechnology Information Working Group conducted a study to critically compare and evaluate patent sequence databases for data content. A series of sequences were searched to find similar sequences from several well known sources: GENESEQ™, CAS REGISTRY/CAplus SM, PCTGEN, NCBI GenBank ®, EMBL-Bank and the EBI Fasta databases. The study highlights some differences between GENESEQ™ and REGISTRY/CAplus SM results within the context of indexing policy and patent coverage. In comparison to the proprietary databases, the authors have identified important deficiencies in the content of the public databanks. This paper also discusses database timeliness and the choice of algorithm as potential reasons for missing data.
ISSN:0172-2190
1874-690X
DOI:10.1016/j.wpi.2008.04.005