Loading…

Prospects for Building the Tree of Life from Large Sequence Databases

We assess the phylogenetic potential of ~300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the dat...

Full description

Saved in:
Bibliographic Details
Published in:Science (American Association for the Advancement of Science) 2004-11, Vol.306 (5699), p.1172-1174
Main Authors: Driskell, Amy C., Ané, Cécile, Burleigh, J. Gordon, McMahon, Michelle M., O'Meara, Brian C., Sanderson, Michael J.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We assess the phylogenetic potential of ~300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.
ISSN:0036-8075
1095-9203
DOI:10.1126/science.1102036