Loading…

The predictive power of the CluSTr database

The CluSTr database employs a fully automatic single-linkage hierarchical clustering method based on a similarity matrix. In order to compute the matrix, first all-against-all pair-wise comparisons between protein sequences are computed using the Smith–Waterman algorithm. The statistical significanc...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics 2005-09, Vol.21 (18), p.3604-3609
Main Authors: Petryszak, Robert, Kretschmann, Ernst, Wieser, Daniela, Apweiler, Rolf
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The CluSTr database employs a fully automatic single-linkage hierarchical clustering method based on a similarity matrix. In order to compute the matrix, first all-against-all pair-wise comparisons between protein sequences are computed using the Smith–Waterman algorithm. The statistical significance of the similarity scores is then assessed using a Monte Carlo analysis, yielding Z-values, which are used to populate the matrix. This paper describes automated annotation experiments that quantify the predictive power and hence the biological relevance of the CluSTr data. The experiments utilized the UniProt data-mining framework to derive annotation predictions using combinations of InterPro and CluSTr. We show that this combination of data sources greatly increases the precision of predictions made by the data-mining framework, compared with the use of InterPro data alone. We conclude that the CluSTr approach to clustering proteins makes a valuable contribution to traditional protein classifications. Availability: http://www.ebi.ac.uk/clustr/ Contact: rolf.apweiler@ebi.ac.uk
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bti542