Lossy Asymptotic Equipartition property for Networked Data Structures

In this article we prove a Generalized Asypmtotic Equipartition Property for Networked Data Structures modelled as coloured random graphs. The main techniques in this article remains large deviation principles for suitably defined empirical measures on coloured random graphs. We apply our main result to a concrete example from the field of Biology.


Introduction
Suppose we have a networked data structure x = (x(u), x(v)) : uv ∈ e generated by a memoryless source G with distribution P (x) is to be compressed with distortion no greater than d ≥ 0, using a memoryless random codebook Ĝ with distribution P (y) .Then the compression performance can be determined by the "generalized asymptotic equipartition property" (AEP), which states that the probability of finding a d− close match between x and any given networked data structure (codeword) y = (y(u), y(v)) : uv ∈ e , is approximately 2 −nR(P (x) , P (y) , d) .The rate function R(P (x) , P (y) , d) can be expressed as an infimum of relative entropies.The main aim of this article is to extend the results that have appeared in the recent literature as [DA16] and the reference therein.
To be specific, in this article we develop a Lossy AEP for networked data structures modelled as coloured random graphs.We prove process large deviation principle (LDP) for the coloured random graph conditioned to have a given empirical colour measure and empirical pair measure, see Doku-Amponsah [DA06], using similar coupling arguments as in the article by Boucheron et.al [BGL02].From this LDP and the techniques employed by Dembo and Kontoyiannis [DK02] for the random field on Z 2 , we obtain the proof of the Lossy AEP for the Networked Data Structures.
We apply our Lossy AEP to the following concrete examples from biology: Metabolic network; This is a graph of interactions forming a part of the energy generation and biosynthesis metabolism of the bacterium E.coli.Here, the units represent substrates and products, and links represent interactions.See Newman [13].
The article is organized as follows.Generalized AEP for Coloured Random Graph Model section contain the main result of the paper, Theorem 2.1.LDP for two-dimensional Coloured Random Graph Model section gives process level LDP's, Theorem 3.1 and 3.2, which form the bases of the proof of the main result of the article.Proof of Theorem 2.1, 3.1 and 3.2 section provides the proofs of all Process Level LDP's for the paper and hence the main result of the article.

Main Result
Consider two Coloured Random Graph processes X = (X(u), X(v)) : uv ∈ E and Y = (Y (u), Y (v)) : uv ∈ E which take values in G = G(X ) and Ĝ = Ĝ(X ), resp., the spaces of finite graphs on X .We equip G(X ), Ĝ(X ) with their Borel σ fields F (x) and F(x) .Let P (x) and P (y) denote the probability measures of the entire processes X and Y.By P (x) (σ,π) and P (y) (σ,π) we denote the coloured random graphs X and Y conditioned to have empirical colour measure σ and empirical pair measure π.See, example [DA06].We always assume that X and Y are independent of each other.
By X we denote a finite alphabet and denote by N (X ) the space of counting measure on X equipped with the discrete topology.By M(X ) we denote the space of probability measures on X equipped with the weak topology and M * (X ) the space of finite measures on X equipped with the weak topology.
Throughout the rest of the article we will assume that X and Y are Coloured Random Graph processes, See [Pe98].For n ≥ 1, let P n denote the marginal distribution of X on V = {1, 2, 3, ..., n} taking with respect to P Let ρ : X × N (X ) × X × N (X ) → [0, ∞) be an arbitrary non-negative function and define a sequence of single-letter distortion measures ρ where B x (v) = (x(v), L x (v)) and B y (v) = (y(v), L y (v)).Given d ≥ 0 and x ∈ G , we denote the distortion-ball of radius d by For (σ, π) ∈ M(X ) × M(X × X ), we write and define the rate function where K (σ,π) ⊗ K (σ,π) (a x , a y ), (l x , l y ) = K (σ,π) (a x , l x )K (σ,π) (a y , l y ).
By x D p we mean x has distribution p.For (σ, π) ∈ M(X ) × M(X × X ), we write For n > 1, we write Theorem 2.1.Suppose X and Y are coloured random graph.Assume ρ are bounded function.Then, ) satisfy an LDP with deterministic, convex rate-function

Application [DA10]
Metabolic network.We consider a metabolic network of the energy and biosynthesis metabolism of the bacterium E.coli modelled as coloured random graph on n nodes partition into nσ n (substrate) block of substrates and nσ n (product) block of products, and n π n number of interactions divided into nπ n (substrate, product), nπ n (substrate, product), nπ n (substrate, substrate)/2, nπ n (product, product)/2 different interactions, respectively.Assume σ n converges σ and π n converges π.If we take ρ(s, r) = (s − r) 2 then, by Theorem 2.1 we have the distortion-rate where subs = substracte and prod = product.

LDP for two-dimensional Coloured Random Graph process
For any n ∈ N we define Throughout the proof we may assume that ω n (a x , a y ) > 0, for all a x , a y ∈ X and ω n,1 (a x ) = σ n (a x ), ω n,2 (a y ) = σ n (a y ).It is easy to see that the law of the two-dimensional coloured graph conditioned to have empirical colour measure σ n and empirical pair measure π n , can be described in the following manner: • Assign colours to the vertices by sampling without replacement from the collection of n colours, which contains any colour (a x , a y ) ∈ X exactly nω n (a x , a y ) times; • for every unordered pair {a, b} of colours create exactly m n (a, b) edges by sampling without replacement from the pool of possible edges connecting vertices of colour a and b, where (3.1) We define the process-level empirical measure L n induced by X and Y on G × Ĝ by Note that we have The next Theorem which is the LDP for L n of the process X, Y is the main ingredient in the proof of the Lossy AEP.
Theorem 3.1.The sequence of empirical measures L n satisfies a large deviation principle in the space of probability measures on (X × N (X )) 2 equipped with the topology of weak convergence, with convex, good rate-function I 1 .
The proof of Theorem3.1 above is dependent on the LDP for Ln given below: Theorem 3.2.The sequence of empirical measures Ln satisfies a large deviation principle in the space of probability measures on X 2 × N (X ) 2 equipped with the topology of weak convergence, with convex, good rate-function where We denote, for any bin v ∈ {1, . . ., n}, by ( X(v), Ỹ (v)) its colours, and for h = x, y by l v (b h ) the number of balls of colour b h ∈ X it contains.Now define the empirical process-level occupancy measure of the constellation by |ν (a x , a y ), (l x , l y ) −ν (a x , a y ), (l x , l y ) |, for ν, ν ∈ M(X 2 ×N 2 (X )).
As this metric generates the weak topology, the proof of Lemma 3.3 is equivalent to showing that for every ε > 0, lim where P indicates a suitable coupling between the random allocation model and the coloured graph.
To begin, denote by V (a) the collection of vertices (bins) which have colour a ∈ X and observe that ♯V (a) = nω n (a).
For h = x, y and every a h , b h ∈ X , begin: At each step k = 1, . . ., m n (a h , b h ), we randomly pick two vertices Given a, b ∈ X ,the probability that V k 1 = V k 2 or the two vertices are already connected is equal to We write and observe that We Define e(t) = (1 + t) log(1 + t) − t, for t ≥ 0 and use Bennett's inequality, see [Be62], to obtain, for sufficiently large n for any δ 1 > 0. Let ε ≥ 0 and choose δ 1 = ε 2m 2 .Suppose that we have B n (a h , b h ) ≤ δ 1 , for h = x, y.Then, by (3.4), Hence, Let 0 ≤ δ 2 ≤ 1. The, for sufficiently large n we have This completes the proof of the lemma.
4. Proof of Theorem 2.1, 3.1 and 3.2 4.1 Proof of Theorem 3.2.We write ϑ 1 (̟ n , ν n ) and state the following Lemmma.Denote by Σ (n) (σ n , π n ) the space of all empirical neighbourhood measures with empirical colour measure σ n and empirical pair measure π n .

Proof of Theorem
We obtain the form of the rate function in Theorem 3.1 if we solve the optimization problem inf I 2 (ν) : ω ⊗ φ −1 = ν = I 1 (ω).
Proof.Observe that C ε defined above is a closed subset of M and so by Theorem 3.1 we have that lim sup We use proof by contradiction to show that the right hand side of (4.4) is negative.Suppose that there exists sequence ν n in C ε such that I 1 (ν n ) ↓ 0.Then, there is a limit point ν ∈ F 1 with I(ν) = 0. Note I is a good rate function and its level sets are compact, and the mapping ν → I(ν)) lower semi-continuity.Now log E Qn e tρ(Bx(j),By(j) → log e tρ(B X ,B Y ) , K (σ,π) , K (σ,π) = d av (σ, π).

Also let D
(n) min := lim
a y , b = b y and a y = b y n 2 π n (a, b) if a = a x , b = b x and a x = b x n 2 π n (a, b) if a = a y , b = b y and a y = b y .
the two vertices are already connected.If one of these two things happen, then we simply choose an edge randomly from the set of all possible edges connecting colours a h and b h , which are not yet present in the graph.This completes the construction of a graph with Φ( Ln,1 ) = Φ( Ln,2 ) = (ω n , π n ) and d( L+ n , Ln ) ≤ 2 n a,b∈X B n (a x , b x ) + a,b∈X B n (a y , b y ) , (3.4) where B n (a, b) is the total number of steps k ∈ {1, . . ., m n (a, b)} at which there is disparity between the vertices V k 1 , V k 2 drawn and the vertices which formed the k th edge connecting a and b in the random graph construction.
. and the law of the empirical process-level occupancy measure L+ n under the random allocation model P(σn,πn) .Recall the definition of exponential equivalence, see [DZ98, Definition 4.2.10].The law of L+ n under P(σn,πn) is exponentially equivalent to the law of Ln under P (σn,πn) .
is the colour distribution in bin v.In our first theorem we establish exponential equivalence of the law of the empirical process-level measure Ln under P (σn,̟n) the law of the coloured random graph conditioned to have colour law σ n and edge distribution π n