Loading…

The frequency distribution of gene family sizes in complete genomes

We compare the frequency distribution of gene family sizes in the complete genomes of six bacteria (Escherichia coli, Haemophilus influenzae, Helicobacter pylori, Mycoplasma genitalium, Mycoplasma pneumoniae, and Synechocystis sp. PCC6803), two Archaea (Methanococcus jannaschii and Methanobacterium...

Full description

Saved in:
Bibliographic Details
Published in:Molecular biology and evolution 1998-05, Vol.15 (5), p.583-589
Main Authors: Huynen, M A, van Nimwegen, E
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We compare the frequency distribution of gene family sizes in the complete genomes of six bacteria (Escherichia coli, Haemophilus influenzae, Helicobacter pylori, Mycoplasma genitalium, Mycoplasma pneumoniae, and Synechocystis sp. PCC6803), two Archaea (Methanococcus jannaschii and Methanobacterium thermoautotrophicum), one eukaryote (Saccharomyces cerevisiae), the vaccinia virus, and the bacteriophage T4. The sizes of the gene families versus their frequencies show power-law distributions that tend to become flatter (have a larger exponent) as the number of genes in the genome increases. Power-law distributions generally occur as the limit distribution of a multiplicative stochastic process with a boundary constraint. We discuss various models that can account for a multiplicative process determining the sizes of gene families in the genome. In particular, we argue that, in order to explain the observed distributions, gene families have to behave in a coherent fashion within the genome; i.e., the probabilities of duplications of genes within a gene family are not independent of each other. Likewise, the probabilities of deletions of genes within a gene family are not independent of each other.
ISSN:0737-4038
1537-1719
DOI:10.1093/oxfordjournals.molbev.a025959