Parallel Approaches for Intervals Analysis of Variable Statistics in Large and Sparse Linear Equations with RHS Ranges

: This study proposes an algorithm capable of working in parallel for solving large and sparse linear equations under given right hand side (RHS) ranges. A comparative study to the direct linear programming method is reported theoretically, computationally and discussed. Moreover, the approach can be adapted for the system under domain decompositions structure leading to a better efficiency experimentally.


INTRODUCTION
Min (Max) x k , k = 1,2,….,n (P1) Subject to n ∑ a ij x j -b i = 0 , i = 1,2,…...,n j=1 x j unrestricted, l j ≤ b j ≤ u j , j = 1,2,…..,n The above model becomes more complex when n is large. Usually, this situation arises in approximately solving linear boundary partial differential equations. Applying finite approximation schemes normally leads to the result of very large and sparse linear equations. By representing uncertainties of equation right hand sides and boundary conditions as a set of statistical confidence intervals, the solution as a set of related variable ranges can be solved using the above formulation. Directly, the problem can be solved as a set of independent 2n LP problems. However, this paper will present another approach theoretically equivalent but computationally much faster as n grows and the proposed method is suitable for use in parallel architecture. Additionally, the approach is illustrated for further applications on obtaining confidence intervals of variable variance/covariance matrix for a given ranges of right hand side variance/covariance.
Linear programming (LP) has been one of the major tools in solving real world resource allocation problems. Its origin and early development historically were described and presented extensively by Dantzig [1] . The problem of determining the confidence interval of statistics also has long been studied by statisticians [2] . A large and sparse system of linear equations normally arises in finite approximations scheme for solving boundary value ordinary and partial differential equation basically found in engineering analysis and computational physics. The resulting system is sometimes too large to be handled by the direct approach and the iterative indirect methods are frequently more appropriate [3,4] . The integration of these three aspects has not yet been studied systematically but its application is motivated by industrial nature of dynamic diffusion or transport phenomena where repetitive operations cause a chance effect leading to a stationary uncertainty of system input or initial/ boundary conditions.
For a more complex system where the resulting equations size can be very large, parallel computations can be more efficient. Some early works on parallel and vector solutions for large linear systems were presented in Heller [5] , Ortega [6] and Stone [7] . Distributed-memory based parallel algorithms on basic matrix-vector multiplication especially in case of large blockdistributed matrices were initially studied in Dekel et al. [8] . This research area has been received much attention and several new approaches on different applications based on new software and hardware advancements have been proposed as found in Ben-Asher and Haber [9] , Catalyurek and Aykanat [10] , Hendrickson et al. [11] , Ogielski and Aiello [12] , Romero and Zapata [13] and Ujaldon et al. [14] . Extensively, these basic parallel operations plays a major role on further developments of parallel iterative algorithms for large and sparse linear systems as illustrated in Basserman [15] and Ortigosa et al. [16] . For a much larger scale system such as time varying three dimensional partial differential equations as found in the fields of solid mechanics and diffusion processes, domain decompositions [17,18] have been a major support for developments of parallel and distributed computing. Their main idea is to partition the original domain to several smaller domains resulting in a set of much less dependent smaller systems of linear equations that can be handled by each processor in parallel. Then, their solutions are adjusted and integrated with a set of coupling equations under manageable sizes related to common variables among sub-problems. A well illustrated example can be found in Saad [4] . Recently, with the advancements of parallel hardware, parallel and distributed algorithms for the direct Gauss elimination approach have been proposed as other alternatives expected theoretically that their efficiency should be better than the iterative algorithms as presented in Balasubramanya et al. [19] , Chandra and Siva Ram Murthy [20] , Gallivan et al. [21] and Tufo and Fischer [22] . A special case of applications on both parallel matrix-vector multiplication and paralleldistributing for solutions of large and sparse linear system is due to solving the steady state discrete and continuous Markov chain model as presented recently in Benzi and Tuma [23] .
It is well known that solving an LP using the classical simplex method is an exponentially bounded [24] algorithm but its average performance was derived by Borgwardt [25] resulting the complexity of O(m) iterations with at most O(m 2 ) operations per iterations where m is the number of constraints in the model. In this study as shown in problem P1, regardless of the bounded variables constraint, m=n. Therefore, solving the problem directly is exponentially bounded in the worst case and polynomial bounded as O(n 4 ) on average. In this work, an alternative algorithm will be proposed. Its main computation complexity deals with computing A -1 leading to a better bound of O(n 3 ). To design basically a parallel scheme for the algorithm, most works reviewed in the previous paragraph can be adapted and applied. An example in case of domain decomposition will be preliminary studied.
In summary, there has been a tremendous amount of works on solving large and sparse linear equations in parallel due to its applications and computer hardware advancements. These works can be used as basic tools to support the purpose of solving problem P1 by the upcoming proposed algorithm. In the next section, a set of parallel approaches will be proposed and verified theoretically and experimentally.

MATERIALS AND METHODS
j=1 In order to solve the corresponding LP problem, the following theorem can be established.

Theorem:
The solution vector X for the problem (P1) is optimal if and only if (1) is satisfied and: for each k such that x k is maximized, for all j, b' j = u' j if a' kj > 0, otherwise, b' j = 0; for each k such that x k is minimized, for all j, b' j = u' j if a' kj < 0, otherwise, b' j = 0; Proof: By considering equation (1), substitute b' j by u' j when a' kl > 0 and by 0 otherwise. The result is n n x k * =∑ a' kj u' j + ∑ a' kj l j (2) j=1 (a' kj >0) j=1 Comparing to equation (1), x k * ≥ x k since b' j ≤ u' j for all j. Similarly, in case of the substitutions when a' kj < 0, x k * ≤ x k . Thus, the forward parts of the theorem hold.
To prove the backward statement, suppose that the solution obtained from equation (2) is not optimal and there exists another solution for b' j = c' j for all j that maximizes x k . The only alternative that can produce x k larger than x k * is to assign at least one c' j > u' j . This is infeasible since b' j ≤ u' j . Therefore, the reverse condition holds in case of the maximization objective and again, similarly for the minimization case.
Additionally, it is obvious to conclude that the above procedure is a polynomial time algorithm with the complexity of O(n 3 ) because the computation of A -1 is the method bottleneck. In summary, a general method for solving problem P1 is to compute A -1 and use substitute b' j in equation (1) according to the theorem. However, when A is very large and sparse, in case of bandwidth matrix normally found in finite approximation for solving boundary value ordinary and partial differential equation, A -1 is fully dense. For an example, a partial differential equation in two dimensions with confidence interval of boundary value is approximated by subdividing 1000 discrete points in each domain leading to the system of linear equations with bandwidth matrix A and one million variables/equations (n = 1000×1000 = 1,000,000). Solving for all possible variable ranges requires a high performance computing hardware facility with an appropriate parallel algorithm. One obvious direct approach is to distribute 2n independent LP problems of P1 among parallel processors in order to minimize the maximum allocated processing time. Nevertheless, a more efficient method adopted from the theorem can be modified to work in parallel by solving independently for each row elements of A -1 as follows.
The above algorithm is designed to break the whole operations into n independent and almost identical jobs. Each job is concerned with solving for a' kj , j = 1,2,..,n representing by the vector z and can be distributed to each available computer and the result of u k and l k can be integrated at the end. Therefore, the decision problem is how to allocate n jobs among all available m parallel processors in order to minimize the total makespan. The difference from the original scheduling research is that in this case, the exact processing time of each job is not known in advance but each job has the same instructions and the effect resulting different processing time is the different right hand side and the number of nonzero z j for all j in each iteration k.
Comparing with the direct approach of distributing 2n LP problems, firstly, the complexity of P1 is polynomial bounded (O(n 3 ) for the sequential processing and O(n 3  and K[bb T ] may not be possible especially in the case that b is measured from some experiments or randomly generated to intimate the real situation. If a steady state does exist, both statistical parameters can be statistically estimated as confidence intervals (Chapters 7 and 10 of Lindgren [2] ). Therefore, the decision problem is to determine the confidence interval of statistical parameters of X (E[X], K[XX T ]) for a given confidence interval E The problem of determining the confidence interval of E[X] can be solved directly using the method described in the previous section. However, the problem concerning K[XX T ] is not obviously linked. Next, consider the following algebraic analysis.
First, multiply equations i th and k th of an n by n linear system and apply the expectation leading to the equation as follows. n n ∑ ∑ E[a ij a kl x j x l ] = E[b i ,b k ] i, k= 1,2,…...,n j=1 l=1 By the symmetry property of covariance elements, the equation can be adapted as follows. n n n ∑ E[a ij a kj x j 2 ]+2{∑ ∑ E[a ij a kl x j x l ]} = E[b i ,b k ] i,=1,2,…,n j=1 j=1 l=j+1 k=i,i+1,..,n Then, expand the expected value of each term in the equation and obtain the following equation with variance/co-variance as variables. n n n ∑ a ij a kj Var[ j=1 l=j+1 i,=1,2,…...,n k=i,i+1,..,n (3) Therefore, the above resulting equations system is linear and for a given range or confidence interval of K[b i ,b k ] for all i and k, the proposed method in the previous section can be used to solve for the range of Var[x j ] and K[x j, x l ] for all j and l as well. However, in case of large and sparse linear system, n can be very large with the total number of equations/variables of n(n+1)/2. The system of corresponding linear system is so large that a single processor computing system is insufficient for handling this matter. Hence, a parallel algorithm is proposed as follows.
First, the linear system concerning variances and co-variances of X can be written alternatively as the following matrix equation.

AK[XX T ]A T = K[bb T ]
Determining K[XX T ] can be performed by solving the above linear system. A simple and direct approach is to find A -1 and substitute the following relation.

K[XX T ] = [A -1 ]K[bb T ][A -1 ] T (4)
For A as a dense matrix, the above matrix solution can be operated within O(n 3 ) consisting of one n by n matrix inversion and two n by n matrix multiplications. Algebraically, each element in K[XX T ] can be obtained by the following equation. n n K[x i ,x k ] = ∑ ∑ a' ij a' kl K[b j , b l ] j=1 l=1 i= 1,2,…,n k= i,i+1,…,n (5) Suppose that a given confidence range K[b j , b l ] is [l ij , u ij ] and A is large and sparse, a parallel computing scheme can be created and designed using the relationship (5). The total number of independent computational jobs is N = n(n+1)/2 where each job has the general description derived in a similar manner as the proposed procedure for the expected value case as follows.
Step 2: Let Max = 0 and Min = 0 For j = 1:n For l = 1,:n If y j z l > 0 Max = Max + y j z l u jl Min = Min + y j z l l jl Otherwise, Max = Max + y j z l l jl Min = Min + y j z l u jl End End Step 3: K[x i ,x k ] is within the range of [Min, Max] Suppose that there are m available computers where m < N, a number of jobs can be assigned to each machine and the processing time in step 1 can be reduced when either i or k is fixed and another index is varied. For an example, when a computer is assigned a set of jobs with i = 1 and vary k from 1 to p < n, computing in step 1 is performed merely in the full form in case of i=1 and k=1. For another k > 1, the processing time in this step can be reduced in half since only solving Az = e k is needed. An application in domain decompositions: In this section, both methods developed in the previous sections are applied to the case where A is formulated by the principle of domain decomposition. In general, the structure of the corresponding equations system is as follows.
where s is the number of sub-domain, B i represents coefficients square matrix of sub-domain i with X i , c i , p i and q i as the associated variable, RHS, lower and upper bound vectors, respectively and E i and F i represent coefficients matrices of interaction between sub-domain i and the common domain. The matrix C represents the coefficients square matrix of the common domain with Y, d, r and t as the associated variable, RHS, lower and upper bound vectors, respectively. To solve for the ranges of mean vectors X i , for all i and Y, the following model can be derived based on the similar principle of (1) in a matrix notation.
For confidence intervals of variable co-variance sub-matrix according to each domain interaction i and j, procedure P2 can be adapted in the following matrix form. s s K ij =Σ Σ A' ik A' jl B kl , i,j = 0,1,2,..,s (11) k=0 l=0 where K ij represents sub-matrix of co-variances between X i and X j when i,j = 1,2,..,s and between X i(j) and Y when j(i) = 0 and B kl represents sub-matrix of co-variances between c i and c j when i,j = 1,2,..,s and between c i(j) and d when j(i) = 0. To verify the correctness of equations (6) -(11), consider AX = b formed by equations (6) and (7) as follows Suppose that A -1 is in the following form.
By using the definition of A' ij , it can be proven that the matrix property of AA -1 holds. Thus, A -1 is valid and can be used in both equations (9) and (10) as claimed. Also, as seen in the block structure described in equations (9), (10) and (11), firstly, the matrix G -1 must be computed and then, parallel and distributed computing can be independently allocated to each processor for determining X i , Y and K ij for all i and j.
Computational experience: In this study, the heat transfer equation (∂ 2 Ø/∂x 2 + ∂ 2 Ø/∂y 2 = f(x, y)) approximated by 5 Points finite difference approach is used with randomly generated ranges of f(x, y) represented by l(x, y) and u(x, y), x = 0, ∆, 2∆, …., N 1 ∆ and y = 0, ∆, 2∆, …., N 2 ∆. ∆ is varied under the constraint that N 1 ∆ = N 2 ∆ = 1. Excluding boundary elements, the total number of variables (n) for the corresponding set of linear equations is (N 1 -1)(N 2 -1) in case of the expected value. The proposed method was coded in Matlab [26] utilizing its special commands for sparse linear equations solver. For each problem size n, a sample of variables range containing 10 observations is obtained and its average is computed and summarized in the following table. Simultaneously, the same set of problems is solved using the direct LP command, "Linprog[.]", of software Matlab [26] . The required computation times are reported in the following The results from both tables indicate that the proposed method is more efficient as the problem size grows. All computations are performed on a microcomputer system with CPU speed of 2.4 Ghz and 1 Gbytes RAM. Moreover, the proposed method is extended to determine a sample of covariance elements under various n and their average computing times are analyzed using 10 observations for each problem sizes and can be shown as follows. Since all test cases in the above table require full dense variance/co-variance matrices, computations of each variable variance/co-variance range consume up to O(n 2 ) limiting the much smaller upper bound of n as compared to the case of expected value with O(n). The size of n > 8000 is prohibited due to available memory. Nevertheless, in practice, it is most likely to find that the resulting variance/co-variance matrix can be approximated as a sparse matrix. In this circumstance, a set of such large and sparse K[bb T ] is randomly generated as banded matrices with k as the number of nonzero elements in each row. Then, ten samples of each n ranging from 10,000 to 500,000 are tested utilizing MATLAB powerful commands on sparse matrix operations [26] based on Procedure P2. The experimental results are as shown below. The result in the above table indicates some possibilities to conduct parallel computations in case of large and sparse variance/co-variance matrices. For a much larger scale problem, matrix partitioning and interpolations should be further developed. It should be noted that all computations have been conducted based on the assumption that solving the corresponding linear systems can be performed on a single processor and the proposed approach is designed to partition the whole problem to several independent linear systems capable of distributing one by one among parallel processor. However, when the matrix A becomes too large to be handled by a single processor, a set of clusters among processors has to be designed. Each cluster consists of a set of designed processors capable of solving each related system. To support this concept in case of Matlab, parallel programming is required and their basic study on matrix operations was experimented in [27] .
Next, a preliminary study on the use of the direct approach P1 compared to using equations (9)-(10) when the problem is in the form of equations (6)-(8) is conducted as follows. First, the heat equation with Lshape boundary is decomposed to 200 domains with 2500 points (variables) in each domain and 5,000 common domain points with generated data of right hand side lower and upper limits. Then, the resulting equations in the non-decomposition mode consists of a sparse 500,000 variables system while the domain decomposition system is involved with sparse 2,500×2,500 B i , 2,500×5,000 F i , 5,000×2,500 E i (i =1, 2,.., 200) and 5,000×5,000 C. The experiment has been conducted using 4 identical processors as specified previously, working in parallel with the additional main server (2.4 GHz and 2 Gbytes RAM) working as jobs manager. For the direct approach, the average time out of ten samples for computing a variable interval is 2,917 sec. which is slightly less than the previous experiment case of 500,000 variables (2940 sec.). With five parallel processors, the expected computing time per element can be reduced by one-fifth resulting almost ten minutes (2917/5). To solve the problem using equations (9) for a specific i, G -1 is computed in the main server firstly. Then, computing A' ij , j = 0,1,2,..,s (=200 in this case) is distributed equally to each processor (=50) and A' 0j is computed in parallel by the main server. After that, the sub-vector interval for X i is computed using equation (9) and algorithm P1 using the main server. At the end, 2,500 intervals are obtained. In this study, i =1 is conducted as an example resulting the total computation time of 57 hrs. 21 min. and 6 sec. Therefore, the average time for computing a variable interval is reduced to merely one and a half minute per element. This basic study initially illustrates a further direction for continuous improvement on implementing algorithm P1 using parallel processing. For algorithm P2, similar scheme can be conducted by using equations (11). It should be noted that the parallel strategy used in this case has not been fully developed and can be improved. For examples, computing G -1 and equation (9) with i=1 have not been parallelized yet. Further explorations should be investigated.

CONCLUSION
In this study, parallel approaches for determining confidence intervals of variables in the large and sparse linear equation system with RHS ranges is proposed and preliminary tested its application possibilities. The basic concept has been verified and validated mathematically. Comparisons with the direct linear programming approach are conducted in case of the mean confidence interval. The test results illustrate that the approach greatly improves outcome efficiency. Moreover, the proposed approach can be adapted to work in parallel using domain decomposition. An example is preliminary studied and the initial result indicates a further improvement.