Loading…

Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments

Information theory traditionally deals with "conventional data," be it textual data, image, or video data. However, databases of various sorts have come into existence in recent years for storing "unconventional data" including biological data, social data, web data, topographica...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on information theory 2012-02, Vol.58 (2), p.620-638
Main Authors: Yongwook Choi, Szpankowski, W.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Information theory traditionally deals with "conventional data," be it textual data, image, or video data. However, databases of various sorts have come into existence in recent years for storing "unconventional data" including biological data, social data, web data, topographical maps, and medical data. In compressing such data, one must consider two types of information: the information conveyed by the structure itself, and the information conveyed by the data labels implanted in the structure. In this paper, we attempt to address the former problem by studying information of graphical structures (i.e., unlabeled graphs). As the first step, we consider the Erdös-Rényi graphs G(n,p) over n vertices in which edges are added independently and randomly with probability p. We prove that the structural entropy of G(n,p) is (n;2)h(p)-logn!+o(1)=(n;2)h(p)-nlog+O(n), where h(p)=-plogp-(1-p)log(1-p) is the entropy rate of a conventional memoryless binary source. Then, we propose a two-stage compression algorithm that asymptotically achieves the structural entropy up to the nlog term (i.e., the first two leading terms) of the structural entropy. Our algorithm runs either in time O(n 2 ) in the worst case for any graph or in time O(n+e) on average for graphs generated by G(n,p), where e is the average number of edges. To the best of our knowledge, this is the first provable (asymptotically) optimal graph compressor for Erdös-Rényi graph models. We use combinatorial and analytic techniques such as generating functions, Mellin transform, and poissonization to establish these findings. Our experiments confirm the theoretical results and also show the usefulness of our algorithm for some real-world graphs such as the Internet, biological networks, and social networks.
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2011.2173710