A Flexible, Real-Time Algorithm for Simulating Correlated Random Fields and Its Properties

Corresponding Author: Michael A. Kouritzin Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton (Alberta), Canada T6G 2G1 Email: michaelk@ualberta.ca Abstract: Contemporary real-time problems like CAPTCHA generation and optical character recognition can be solved effectively using correlated random fields. These random fields should be produced on a graph in order that problems of any dimension and shape can be handled. However, traditional solutions are often too slow, inaccurate or both. Herein, the Quick Simulation Random Field algorithm to produce correlated random fields on general undirected graphs is introduced. It differs from prior algorithms by completing the graph and setting the unspecified covariances to zero, which facilitates analytic study. The Quick Simulation Random Field graph distribution is derived within and the following questions are studied: (1) For which marginal pmfs and covariances will this algorithm work? (2) When does the marginal property hold, where the sub-graph distribution of an algorithm-simulated field matches the distribution of the algorithm-simulated field on the subgraph? (3) When does the permutation property hold, where the vertex simulation order does not affect the joint distribution?


Introduction
Correlated random fields are used in science and technology to model spatially distributed random objects. The applications of random field across the sciences are broad and include sequential Monte Carlo, computer vision, cryptography, astrophysics, rainfall, hydrology, analysis of gene expression time series, medical image processing and inverse optics and image synthesis; see, for example, Kouritzin (2017), Schlather et al. (2015), Chellappa and Jain (1993), Diaconis (2009), Vio et al. (2002), Leblois and Creutin (2013), Li et al. (2008), Li et al. (1995), Li (1995) and Winkler (2003). Furthermore, mathematicians often want to couple a collection of random variables with given distributions together on a single probability space while matching some constraint like covariances. In either situation, the complete joint distribution of the field may be unknown or even irrelevant as enough meaningful information is captured by marginal distributions and pairwise covariances between random variables. In the Gaussian case, many simple efficient methods, like covariance matrix decomposition, moving averages, Fast Fourier Transform (FFT), turning bands and local average subdivision, exist (see Shinozuka and Deodatis (1996), Kleiber (2016) or Blanchard et al. (2016) for example). However, these methods are easiest to use over a regular grid and many random fields are fundamentally non-Gaussian. In the general case, probability density functions are usually approximated by probability mass functions (pmfs) if necessary and some type of Markov chain Monte Carlo method is used when exact field distributions are desired. However, these methods require a very large number of iterations to converge, for example it took 2000 iterations in the simple Hamlet example in Diaconis (2009), and therefore are generally not suitable to real time computations. On the other hand, there are many approximation methods, often based upon the FFT or spectral decomposition and Karhunen-Loeve expansion to approximate covariance structure of fields (see e.g., Vio et al. (2002)).
To meet the diversity of problems in a variety of dimensions, Kouritzin et al. (2014) considered random fields on a general undirected graph structure and proposed an algorithm for producing a new class of discrete correlated random field on such graphs by either one-pass simulation or Gibbs-like resampling. The approach has been applied to Optical Character Recognition (OCR) Kouritzin et al. (2014) and the generation of both black-and-white Kouritzin et al. (2013) and gray-level Newton and Kouritzin (2011) CAPTCHAs ( Fig. 1 shows a new example of such a gray level CAPTCHA.) The class of random fields created by their algorithm incorporate given probability mass functions (pmfs) at the vertices in a graph and specified pairwise covariances corresponding to edges existing in that graph. (This is translated into a pmf for the gray levels of each pixel and covariances between nearby pixels in this CAPTCHA example.) The joint distribution between pairs of vertices connected by a specified covariance edge is known in terms of two sets of auxiliary parameter pmf collections that can be selected for generality. However, the joint subgraph distribution on an incomplete subgraph is unknown for the algorithm in Kouritzin et al. (2013).
The starting point for the simulation consists of a fixed portion as well as a design portion. The fixed portion is an undirected graph together with the desired marginal vertex pmfs (the π's) and the collection of nonzero covariances (the β's) for the graph edges. (This setting is general enough to handle simulation in any dimension for example.) The design portion consists of two sets of auxiliary (vertex) pmfs (theˆ's π and the 's πɶ ) that can be used in place of the 's π in portions of the algorithm to do things like improve efficiency or destroy independence (Actually, there is a wide assortment of reasonable choices for the ˆ's π and the 's πɶ discussed in Kouritzin et al. (2014)). Simulating the graph then amounts to directing the graph in an acyclic manner, fixing a topological sort of the vertices and using Proposition 1 of Kouritzin et al. (2014), requoted as Proposition 1 below, recursively (See Kouritzin et al. (2014) for details.) Our modified algorithm, introduced herein, completes the graph by adding edges of zero covariance wherever necessary before simulation. This completion does not complicate nor slow the simulation yet allows us to derive the complete field distribution in closed form for all possible auxiliary pmf parameters. We call this completed-graph simulation algorithm and resulting random field the quick simulation algorithm and quick simulation field herein. This paper focuses on the constraints and properties of the random field generated by this quick simulation algorithm. Naturally, the algorithm cannot work for all possible parameters and might not work for others. We start by giving the joint (field) distribution of the random field generated by this algorithm (when it works). From there, we study regularity, meaning when the algorithms does provide a legitimate distribution over the whole space of vertices. This is equivalent to ensuring that the recursive formula (2.5) of Proposition 1 produces a conditional pmf in every iteration. It was observed in our CAPTCHA Kouritzin et al. (2013) and OCR Kouritzin et al. (2014) applications that the occasional illegitimate conditional pmf value outside [0, 1] can be replaced with a value inside without noticeable effect on the simulation. However, it is still important to know when the only possible source of irregularity is numeric and not algorithmic. Next, we establish the marginality property that ensures the distribution of a random field on a subgraph projected from the random field constructed on the whole graph is the same as that for a random field constructed directly on this subgraph. Finally, we investigate the permutation property that makes sure the random field simulated from all topological sorts corresponding to the same complete undirected graph are the same in the sense of probability distribution. We establish necessary and sufficient conditions for this permutation property.

Example 1
Suppose we have the following complete undirected graph G with vertices v 1 , v 2 , v 3 , probability mass functions 2) we assign the joint probabilities as follows: Moreover, if we simulated two vertices v i , v j , then we get: so marginality is also maintained.
In this note, we show how to compute these probabilities so that the pmfs and covariances are preserved in general as well as establish the conditions for the marginality and permutation properties above to hold.
The remainder of this note is laid out as follows: Section 2 contains our notation and background. Next, we give the closed form of correlated random field, discuss regularity and establish the marginality property in Section 3. The permutation property is studied in Section 4.

Probabilistic Setup
Let V be a finite set of vertices, V denote this set of vertices with an ordering and X v be a finite state space for each v∈V. For any nonempty subsequence B V ⊂ , the x by x i to ease notation. A random field ∏ is a strictly positive probability measure on X. The random vector ( ) v v V X X ∈ = on the probability space ( , 2 , ) ∏ X X is also called a random field. For B V ⊂ , the random subfield on B is the projection map v∈V} is a collection of subsets of V: A random field ∏ is Markov with respect to ∂ if for all x∈X:

Problem Statement
Let E be a set of edges where each (u, v) ∈ E with u, v∈V has no orientation but indicates u, v are neighbors of each other. Then, G = (V, E) is an undirected graph. If for every pair of vertices u, v∈V, there is a path of edges in E connecting u and v, then G is connected. If every vertex in G has a neighbor with at least two neighbors, then G is sufficiently connected. If for every pair of nonneighbor vertices z, u there is a neighbor of z and a neighbor of u that are distinct, then G is disjoint pair We illustrate the new concepts of sufficiently connected and disjoint pair rich.

Example 2
Consider the graphs in Fig. 2.
Both graphs in Fig. 2 are connected. However, neither is sufficiently connected since in both cases none of the neighbors of w have two neighbors.

Example 3
The graphs in Fig. 3 illustrate the definition of "disjoint pair rich".
In (B) non-neighbors z and u do not have distinct neighbors. Clearly in (C), every vertex has a neighbor with two neighbors and every pair of non-neighbors has distinct neighbors. Yet, two vertices only have one neighbor.

Example 4
If every vertex in a graph G has two neighbors, then it is disjoint pair rich. It is also sufficiently connected. We are interested in creating a random field over V, where random variable X v at a vertex v∈V has a predescribed pmf π v and random vectors (X u , X v ) have a predescribed non-zero covariance β uv (= β vu ) for each (u, v)∈E. Naturally, this problem could be ill-posed in the sense that there are mathematically incompatible collections of pmfs and covariances. Also, there often are multiple solutions with some being more efficient to simulate and others having nice properties like the marginal and permutation properties defined above.

Directed Graph
The random variables in the field are simulated in sequence. The first step towards sequencing is directing the graph. Let A be a set of ordered vertex pairs, called arcs, (indicating the first vertex in the pair is simulated prior to the later). Then,

Graph Completion
denotes its completion, where there is an arc between every pair of vertices and the direction of an arc that is also in A matches that of A. Kouritzin et al. (2014) gives one possible algorithm to construct an acyclic complete directed graph and a topological sort on V, i.e. a simulation order where N = |V| is the number of vertices. Our new Quick Simulation Algorithm works on a completed acyclic directed graph. Zero covariances are placed along any added arc i.e. β v,u = cov(X u , X v ) = 0 when (v, u)

Conditional Probability Update
The Quick Simulation Random Fields match a collection of pmfs {π v , v∈V} and a collection of covariances However, there are also two auxiliary pmf parameter sets that provide flexibility in the choice of field distribution as well as simulation. (See Kouritzin et al. (2014) for examples of choices for these auxiliary pmfs.) They also appear in the conditional probability update through functions: . g ɶ andĥ may look mysterious here. However, looking ahead to (3.3), we see they affect the field distribution in our new algorithm. g ɶ normalizes the sample x v by subtracting the mean and dividing by the variance but it allows this normalization to be done with respect to any convenient non-trivial pmf πɶ that could be different than π. ĥ allows us to consider all the parents except the one we are currently setting the covariance for as if they came from a different distribution π . Intuitively, this makes sense. When we are focused on the covariance for one parent the other parents could have just as easily come from π or π . The following proposition establishes that this flexibility is allowed. Let The main proposition in Kouritzin et al. (2014) is: v∈V} are pmfs and {β v,u : (u, v) ∈A or (v, u) ∈A} are numbers such that the right hand side of: is computed according to (2.4). Form the conditional probabilities recursively using (2.5), starting with Then, the random field X, defined by (the multiplication rule and (2.5)): has marginal probabilities {π v } and covariances cov(X u ,

Remark 1
The term non-trivial pmfs can be interpreted as: Each v πɶ should have non-zero variance and each ˆv π should be strictly positive. These auxiliary pmfs affect the field distribution but not its marginal vertex pmfs nor its vertex-vertex covariances.

Remark 2
In Kouritzin et al. (2014), there was the stronger constraint that the right hand side of (2.5) is in [0,1]. However, . Hence, if the right hand side of (2.5) is non-negative, then it is in [0, 1] and (2.5) defines a legitimate conditional pmf.

Remark 3
Notice that (2.5) gives the same value, whether we consider the given graph D or its completion D where the added arcs have zero covariance.

Distribution and Marginality of Quick Simulation Fields
Proposition 1 can be extended to give the full field distribution when the graph is complete.

Proposition 2
Assume that are numbers such that the right hand side of: ( ) Then, the random field X, defined by: • for each ∈ i i v x X and n = 1,…,N.

Remark 4
The one-pass algorithm (as opposed to the Gibbstype algorithm used in Kouritzin et al. (2013)) follows from (3.1). We just use the conditional probability to simulate the new vertex given the prior ones in the topological sort. However, the big efficiency comes from the fact that the terms in (3.1) are only non-zero (and hence need to be computed) in the case where v j is a parent of v i in the original (non-completed) graph.

Remark 5
Since the terms with

Remark 6
Regularity means that the right hand side of (3.1) is a conditional pmf. As noted in Remark 2, the right hand side of (3.1) need only be non-negative, which is equivalent to: and can be checked during the iteration. Notice: (1) There is no constraint on where, pa(v i ) denotes parents in the original (notcompleted) graph. If pa(v i ) = {v i−1 } is a singleton, then (3.5) further simplifies by (2.2) to: x X X . One can check (3.5) or (3.6) iteratively to ensure the Quick Simulation algorithm is producing a field with the desired pmfs and covariances. Now, we show how equality in (3.6) is hit: Hence, we hit this bound when we have a singleton parent and one value of X i−1 precludes another value of X i .

Proof of Proposition 2
a) This follows immediately from Proposition 1 and the fact that the parents of v i are all v 1 ,…,v i-1 when the graph is complete. b) Note (3.3) holds for n = 1. Now, we assume it is true for n−1 with some n∈{2,…,N} and show it for n.
(3.1) is equivalent to: This is just the distribution we would have arrived at if we had just simulated {v 1 ,…, v l−l , v l+1 ,…, v N } in order. Using (3.9) repeatedly, we have proved the following marginality lemma.

Lemma 1
Suppose the conditions of Proposition 2 hold and B ⊂ V . Then:
This example illustrates several things about Quick Simulation Fields: order matters in general, there are dependent uncorrelated field and independence generally does not happen when ˆv v π π ≠ . Indeed, we explain below there is usually dependence even when ˆv v π π = .
Example 6 In the important special case where ˆv v π π = for all v the closed form becomes:

Remark 7
The following proof reveals the equivalence of (1) and (2) holds even if the original graph G is not connected.
(2) implies (1): Multiplying (4.1) by ( , ) u g u x ɶ yields: for all x u ∈X u , x v ∈X v , x w ∈X w and distinct u, v, w∈M N . Take a∈G N and let b = (l l + 1) ο a for 1≤l≤ N−1. Noting that the transpose operations (l l + 1) are generators, we just need to show (4.6) for (arbitrary) a and this b. However, the left hand terms in (4.6) with i<l; j>l+1; j≤ l−1, l +2≤i; and j = l, i = l +1 directly cancel with the corresponding right hand terms for this b. Considering the (remaining) terms on the left side of (4.6) with j≤l− 1 and i = l, l + 1 for this b and using (4.9) with u = a(j), v = a(l), w = a(l + 1), we get upon manipulation: (4.10) so they are equal to the corresponding terms on the right of (4.6). (Notice the switch of π andπ in the final factors in (4.10).) Finally, the terms on the left of (4.6) with j = l, i≥l + 2 and j = l + 1, i ≥ l + 2 for b = (l l + 1)a: l are just the terms on the right of (4.6) with j = l + 1, i ≥l + 2 and j = l, i ≥l + 2 i.e. in reverse order. Hence, by breaking the summation up, we have shown (4.6) holds for arbitrary a and b = (l l+1) ο a, which implies (4.6) holds for arbitrary a, b and sufficiency follows.
(3) Implies (2) Letting u, v, w∈V be distinct and using (4.2, 4.3), we have that: is constant, which in turn implies: where, c u , c v are constants. Since G is sufficiently connected (by non-zero covariances) every vertex can be included in some connected triple as above and we must have that:

Ethics
This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and there are no ethical issues involved.