Loading…
The human genome contracts again
The number of human genomes that have been sequenced completely for different individuals has increased rapidly in recent years. Storing and transferring complete genomes between computers for the purpose of applying various applications and analysis tools will soon become a major hurdle, hindering...
Saved in:
Published in: | Bioinformatics 2013-09, Vol.29 (17), p.2199-2202 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The number of human genomes that have been sequenced completely for different individuals has increased rapidly in recent years. Storing and transferring complete genomes between computers for the purpose of applying various applications and analysis tools will soon become a major hurdle, hindering the analysis phase. Therefore, there is a growing need to compress these data efficiently. Here, we describe a technique to compress human genomes based on entropy coding, using a reference genome and known Single Nucleotide Polymorphisms (SNPs). Furthermore, we explore several intrinsic features of genomes and information in other genomic databases to further improve the compression attained. Using these methods, we compress James Watson's genome to 2.5 megabytes (MB), improving on recent work by 37%. Similar compression is obtained for most genomes available from the 1000 Genomes Project. Our biologically inspired techniques promise even greater gains for genomes of lower organisms and for human genomes as more genomic data become available.
Code is available at sourceforge.net/projects/genomezip/ |
---|---|
ISSN: | 1367-4803 1367-4811 1460-2059 |
DOI: | 10.1093/bioinformatics/btt362 |