Accessing the Appropriateness of a Spatial Regression Using Generalized (h1h2-)Slepian Random Field

Corresponding Author: Wayan Somayasa Department of Mathematics, Halu Oleo University, 93232 Kendari, Indonesia Email: wayan.somayasa@uho.ac.id Abstract: In this study we derived asymptotic goodness-of-fit test (model check) for spatial regression where the critical region as well as the pvalue of the tests are approximated based on the distribution of a type of the integral functional of the generalized (h1h2-)-Slepian field and the setindexed Gaussian white noise. Such random fields are obtained as the limit process of the moving and the cumulative sums processes of the sequence of random matrices consist of independent and identically distributed random variables indexed by the points of a design constructed by means of a given continuous probability measure. Although the common approach in model diagnostic for regression is based on the functional of the residuals, in this study a new different idea is proposed by directly investigating the moving and the cumulative sums of the array of the observations. It is shown that these approaches are mathematically tractable and practically more applicable. Simulation study is conducted for investigating the finite sample size behavior of the tests. An application of the procedure to a mining data is also discussed, where from the perspective of geology and geophysics, polynomial model is reasonable and suitable for the data.


Introduction
Modelling spatial data using spatial random field (process) approach has been increasingly studied in various scientific disciplines among them are agriculture, environmental sciences, geostatistics, geology, medicine, biology, mining industry, among others. In the statistical literatures of spatial analysis the real-valued variable observed at the space coordinate is usually regarded as a realization of a stochastic process indexed by a set of points or a family of sets which is commonly called random field, cf. Cressie (1993;Ripley, 2004;Wackernagel, 2003). The measured variable in spatial analysis might stand for percentage of either Ni, Fe or Au in mining exploration, see e.g., Tahir (2010;Somayasa et al., 2015a;2015b;Somayasa and Wibawa, 2015;Somayasa et al., 2016) or the incidence rates for breast cancer, cf. MacNeill et al. (1994). We refer to (Cressie, 1993) for a comprehensive review and bibliography.
One important purpose of the statistical analysis for spatial data is optimal prediction of an unobserved part of the process. In the references of spatial data analysis this type of statistical inference is called kriging. As stated in the literatures mentioned above, the result of universal kriging depends heavily not only on the covariance structure of the spatial process, but also on the adequateness of the assumed model that describes the mean of the observed variable, (Cressie, 1993;Ripley, 2004;Wackernagel, 2003). This means that preliminary diagnostics involving model validity check as well as model selection must be conducted before kriging to prevent wrong conclusion. A serious spatial analysis should be accompanied with a statistical inference for checking whether or not the assumed model fit to the sample. To this end there has been many approaches and procedures proposed in the literatures how to handle a proper model check. Stute (1997;Stute et al., 2008) for a complete bibliographical information.
It is the aim of the present paper to give a significance contribution in model diagnostic for univariate spatial regression by establishing asymptotic test procedures for checking the appropriateness of an assumed model defined on a closed rectangle under an arbitrary experimental design. By combining the setup of Goodness-of-Fit (GoF) hypothesis for regression defined both in Arnold (1981;Eubank and Hart, 1993), we propose asymptotic tests method by defining a test statistic expressed as a Riemann-Stieltjes integral of a function defined on a competing alternative with respect to the so-called spatial Moving Sums (MOSUM) process of the arrays of observations. We call this test in the sequel as MOSUM test for brevity. This statistic is shown to converges in distribution (weakly) to the integral of such regression function with respect to a generalized (h 1 h 2 )-Slepian field when the hypothesis is true. The critical region of the MOSUM test is determined based on the probability distribution of such random field. We show in the appendix that the generalized (h 1 h 2 )-Slepian field is obtained as a limit process of the MOSUM process of the sequence of random matrices consisting independent and identically distributed random variables with finite first and second moments indexed by the design points constructed using arbitrary probability measure. We note that the ordinary (h 1 h 2 )-Slepian field studied in Fuchang and Li (2007;Bischoff and Gegg, 2014), is a spatial process obtained as a limit process when the design points are constructed using the uniform probability measure.
The application of spatial process such as the Brownian sheet (Brownian (2) motion), Brownian pillow and the set-indexed Bronian sheet in GoF as well as Lack-of-Fit (LoF) for spatial regression has been investigated in many literatures. A common feature of most work is to test the hypothesis by investigating the continuous functional of the Cumulative Sums (CUSUM) of the residuals. The critical region was developed by studying the principal component of the corresponding functional of the Gaussian processes stated above. For example, (Stute, 1997;Stute et al., 2008) pro-posed the Kolmogorov-Smirnov functional of the so-called marked empirical process based of the residuals. MacNeill and Jandhyalla (1993;Xie and MacNeill, 2006) investigated Cramer-von Misses functional of the CUSUM process of the residuals for detecting boundary in spatial regression. Geometric approach have been proposed in the works due to Bischoff (2002;Bischoff and Somaysa, 2009;Somayasa et al., 2015a). However most of these papers have restrictive application, because the problem addressed to the computation of the quantities of the limiting distribution of the test statistic is mainly not tractable as the dimension of the experimental region gets large. We show in this study that our proposed method is more applicable.
To be able to state about the sensitivity of the MOSUM test, we discuss a comparison study by defining a similar test using the integrated regression function under alternative with respect to univariate Gaussian white noise which is a random filed defined e.g., in Alexander and Pyke (1986;Pyke, 1983;Gaensler 1993;Lifshits, 2012). This statistic is actually the limit of that defined as the integral with respect to CUSUM process of the arrays of the observations. For convenience we call such test as CUSUM test. The behavior of the tests will be studied by investigating their empirical as well as limiting power functions by simulation.
The paper is organized as follows. In section 2 we give a more precise definition of the model and the hypotheses under study to fix the idea. The detailed treatment of the MOSUM test and also its asymptotic distribution is presented in section 3. Throughout this work the test procedures are derived under more general condition incorporating the technical situation frequently encountered in mining or geological engineering in which by some practical reason the engineers sometimes cannot or will not determine the drilling bores equidistantly. We propose an experimental design by generalizing the approach introduced in Somayasa (2013) in which we construct the design points over the experimental region by means of a given probability measure defined on the experimental region. Hence our test procedure will be more applicable in practice. However, mathematically the derivation of the result seems to be more difficult. We therefore need more effort. In section 4 we study the CUSUM test. The finite sample behavior of the tests are investigated by simulation in section 5. In Section 6 we discuss the application of the methods to real data. This paper is closed with some conclusions and remarks for future works. Proofs of propositions, theorems and corollaries are postponed to the appendix.

Model Definition
In this section we give a brief review to the sampling scheme, the regression model and the hypotheses under study. For more detail interested reader is referred to (Somayasa, 2013;Somayasa et al., 2015a). We consider the experimental design consists of n 1 × n 2 points:  (Bischoff and Somayasa, 2009;MacNeill and Jandhyala, 1993;MacNeill et al., 1994;Somayasa et al., 2015a). This kind of experimental design is from practical point of view sometimes not efficient. It is mathematically associated with the uniform probability measure defined by a scaled Lebesgue measure 2* 2 1 : | | where, 1 B is a the indicator of B. Then for the n 1 × n 2 regular lattice, it can be shown by applying the well known Portmanteau theorem (Billingsley (1999), pp. 18-19), that 1 2 2* D n n P λ ⇒ , as n 1 , n 2 →∞. Here and throughout the paper "⇒" stands for the convergence in distribution in the sense of (Billingsley, 1999). Based on this fact we go through from the opposite direction with the question: If we are given a continuous probability measure P defined on the Borel σ-algebra B(D) with the corresponding distribution function F, can we construct a design is not necessarily a regular lattice unless P is the uniform probability measure on B(D), (Somayasa, 2013). For convenience we call the probability measure P under which 1 2 n n × Ξ is constructed a design.
As an example, let us consider a probability measure P defined on the measurable space (D := [1, 2] × [2, 3], B(D)), having the probability density function u(t, s) := 12/(t 2 s 2 ) and the distribution function F(t, s) := 12(1-1/t)(1/2-1/s), for 1 ≤ t ≤ 2 and 2 ≤ s ≤ 3, illustrated in Fig. 1. There exist distribution functions F 1 (t) := 2(1-1/t) and F 2 (s) := 6(1/2-1/s) on [1, 2] and [2, 3], respectively, such that for (t, s) ∈ D, F(t, s) = F 1 (t)F 2 (s). Then for fixed n 1 ≥ 1 and n 2 ≥ 1, the design point is computed by the formula:  throughout this paper as Y kℓ , g kℓ and ε kℓ , respectively. It is important to note that for our result we do not need normal assumption. As nicely noted in (Arnold, 1981;Eubank and Hart, 19993;Stute, 1997) a common feature of the GoF test for regression falls into the following framework. Let V := [z 0 ,..., z p , z p+1 ,..., z m ] and W := [z 0 ,..., z p ], p ≤ m, be linear subspaces of L 2 (D, P), where z 1 ,..., z p , z p+1 ,..., z m are known regression functions which are without loss of the generality assumed to be orthogonal as functions in L 2 (D, P). The space L 2 (D, P) is the set of squared integrable functions on D with respect to P which is furnished with the inner product and norm denoted respectively by 〈⋅,⋅〉 and ||⋅||. We test the null hypothesis that g∈W while observing g∈V. Since g is observed as a function that lies in V and on the other hand W⊆V, then g admits an orthogonal decomposition g ≡ g 1 ⊕ g 2 , where g 1 ∈ W and g 2 ∈ V ∩W ⊥ , with 〈g 1 , g 2 〉 P = 0. Hence the problem of testing H 0 : g∈W while observing g∈V can be handled by testing the hypotheses: We notice that the statement that g∈W is equivalent to that of x ∈ ℝ . Result for higher dimensional rectangle can be obtained immediately.

Test based on Moving Sums of the Observations
In this section we introduce a test based on spatial Moving Sums (MOSUM) process of the observations extending the notion of one dimensional MOSUM test defined in Chu et al. (1995). Let h 1 and h 2 be positive numbers such that 0≤h 1 ≤(a 2 -a 1 ) and 0≤h 2 ≤(b 2 -b 1 ). The moving sums process of the matrix of observations 2 1 n n Y × consisting of:  (Bischoff and Somayasa, 2009;Somayasa et al., 2015a). This end can be verified based on the assumption that F(t, s) = F 1 (t, s)F 2 (t, s) and the continuity of F 1 and F 2 on [a 1 , a 2 ] and [b 1 , b 2 ], respectively. Indeed we have: by the monotonicity of F 1 .
Further, since: which consists of ( ) ( ) n n Y × of the observations. Hence, by following the terminology introduced in (Bischoff and Gegg, 2014;Chu et al., 1995) we call the term h 1 h 2 in this study the window size of the process.
By the definition of ( )  Model (1) such that the unknown regression function g∈L 2 (D, P) have an orthogonal decomposition g ≡ g 1 ⊕g 2 with g 1 ∈W and g 2 ∈V∩W ⊥ . Suppose that g 1 and g 2 are continuous and have bounded variation on D.
Then an asymptotic size α test for H 0 : g 2 ≡ 0 against Furthermore, the finite sample size power function of this test is given by: We note that all integrals involved in the definition of I(S l;P ) are in the sense of Riemann-Stieljes type defined e.g., in Strook (1994).

Remark 3.2
Unfortunately, the probability distribution model of I(S l;P ) is not tractable for arbitrary probability measure P, by the reason the increments of S l;P is not stochastically independent and the limit I(S l;P ) still depends on the choice of g 1 , unless g 1 and f 1 are orthogonal on Therefore, the test is implemented in practice by approximating m 1-α by generating Monte Carlo simulation. However, when the design is constructed under the 2* D λ (regular lattice), it can be shown that ( ) 2* ; D l I S λ has independent increment, see Proposition A.6.
As pointed out in Proposition A.7, it can be shown that: which converges point-wise to the following power function: , ,

Comparison to Set-Indexed CUSUM Method
In this section we aim to establish a different approach for testing Hypothesis 2 in that instead of considering the moving sums we define other reasonable test statistic is the cumulative sums process of the observations indexed by D. This type of stochastic process is a special case of the more general one defined by: which is commonly called set-indexed partial sums process, (Alexander and Pyke, 1986;Gaenssler, 1993;Pyke, 1983;Xie and MacNeill, 2006 By this reason the CUSUM test is viewed as a generalization of the MOSUM test. Clearly the MOSUM test differs from the CUSUM test in that each moving sum contains a fixed number of the observations, whereas the cumulative sums test incorporates more and more observations. Therefore we can conjecture that MOSUM test should be more sensitive than that of CUSUM test in detecting the change in model, see also (Chu et al., 1995). It is the purpose of this work to investigate this sensitivity property by comparing the behavior of the finite sample power functions of both tests.
The following theorem gives the asymptotic size α test for testing (2). The proof is devoted to the appendix.

Theorem 4.1
Let g have an orthogonal decomposition g ≡ g 1 ⊕ g 2 with g 1 ∈W and g 2 ∈V∩W ⊥ as functions in L 2 (D, P). Suppose that g 1 and g 2 are continuous and have bounded variation on D. Then an asymptotic size α test for the An immediate consequence of Theorem 4.1 is the asymptotic power function of the test as presented in the following corollary.
where, Φ is the cumulative distribution function of the standard normal distribution.
In contrast to MOSUM test, CUSUM test can be realized in the practice in relatively easier way by the reason the quantities addressed to the limiting distribution of ( ) where, t * is the observed value of ( ) 2 1 * n n CU Y × . The decision making process is mainly based on the p-value instead of computing the value of Φ −1 (1-α)||f 1 || P .

Simulation Study
In this section we present simulation study to approximate the quantiles of the statistic

Constant Model
In the first case we assume under H 0 , a constant model defined as Y (t, s) = β 0 z 0 (t, s) + ε(t, s) while we are observing a first order model given by Y (t, s) = β 0 z 0 (t, s)  ∫ ɶ ɶ whose simulated (1-α)-quantiles is presented in Table 2.

t s z t s t z t s s z t s t t z t s s s z t s t t t
We present the simulation results for a choice f 1 ≡ z 5 in Table 2 by generating the observations under H0 from the model: ; P n n f Ψ lies closed enough to its limit Ψ P (f) as f varies in V∩W ⊥ when the sample size n 1 and n 2 get large. The simulation is based on 10000 runs developed using R.

Constant Model
By considering the hypothesis formulated in Subsection 5.1.1, we have ||f 1 || P = 0.37045 for 1 1 2 f z z ≡ + ɶ ɶ , so that Φ −1 (1-α)||f 1 || P = 0.60934, when α = 0.05. In order to make f varies in V∩W ⊥ , we define f ≡ λf 1 , for λ ∈ ℝ and generate the samples independently from the normal distribution variance σ 2 which is assumed to be unknown. In this case we estimate σ 2 by using the consistent estimator defined in (Arnold, 1981) ; P n n Ψ approximates well those of the limit Ψ P achieving the size α = 0.05 at λ = 0 as they should be. Fig. 2. The graphs of the empirical power functions of the asymptotic size α = 0.05 CUSUM test for constant model represented by using dotted line approximated by the limit power function (Ψ P (λf 1 )) scattered using smooth lines. The design points are generated using the CDF F(t, s) = 12(1-1/t)(1/2-1/s) on D

First Order Model
We simulate the power function of the CUSUM test for the setup considered in Subsection 5.1.2 in which we propose a test using the statistic ( ) Next, we consider the same test problem as before with a little modification in that the experimental design is now given by a regular lattice of size n 1 × n 2 on the experimental region I 2 := [0, 1] × [0, 1]. Then the corresponding orthogonal version of the regression functions {z 0 , z 1 , z 2 , z 3 , z 4 , z 5 } are given by: For the two different situations we present the simulation results in Fig. 4 and 5, respectively. It can be seen therein that independent to the choice of the design strategy, the limiting power function gives relatively good approximation to that of the finite sample power function of the CUSUM test. Both quantities achieve the specified probability 10% when λ is set to zero even for relatively moderate sample sizes. Thus, in the practice we can realize the test by directly calculating the quantiles of the limiting distribution of ( )

Numerical Application
This section illustrates strategies for selecting appropriate model for describing the physically meaningful functional relationship between the conditional and the response variables given by the coordinate and the observed percentage of Ferum (Fe), respectively. We study the data provided in Tahir (2010) received from a mining industry, which consists of the percentage of Fe observed independently over 7×14 lattice points of drilling bores on the exploration region of the company with 7 equidistance column running from west to east and 14 equidistance rows running from south to north as scattered in Fig. 6. Here our goal is to verify by conducting both MOSUM and CUSUM tests for checking whether or not a first-order model appropriate for describing the model is. For that we regard the observation as a realization of a regression model defined on the unit rectangle I 2 by putting the coordinate (-5824,-6000) where the observation process was initiated as the point (0,0) and the coordinate (-5825, -5725) where the observation process was ended as the point (1, 1). To stabilize the variance we apply logarithm transformation to the percentages of Fe. We denote the transformed measurement as LogFe. Preliminary goodness of fit for the normality of the sample is presented in Fig. 7 which shows that the distribution model of LogFe is not fit to normal family. Table 2 presents the computation results of the critical values and the corresponding p-values of the MOSUM and CUSUM tests compared with those proposed in (Somayasa et al., 2015a)  See also the cumulative sums defined in (Bischoff and Somayasa, 2009;Somayasa et al., 2015a;Xie and MacNeill, 2006). For each assumed model we also calculate n σɶ using the method proposed in (Arnold, 1981).
In the case of constant model, both the MOSUM and the CUSUM tests result in too small p-values. This mans that the constant model is not appropriate for LogFe under the MOSUM and CUSUM tests. But quite different result is obtained when we consider the pvalues of the KS and CvM tests in which constant model could be adequate at level less than 3% for KS and at level less than 1% for CvM, respectively.
The MOSUM as well as the CUSUM tests do not reject a first order model for all level of significance α≤1.668% and α≤0.297%, respectively. This means that there is a significance evidence where a first order model is fit to the sample. Under additional information obtained from the p-values of the KS and CvM tests in which the hypothesis is not rejected for almost all frequently used values of α, it can be stated that first order model is an appropriate model for LogFe. This conjecture is also suitable with the scatter plot of the data. Even though second order model is also fit to the model when the test is conducted using the MOSUM and KS tests as their p-values show, since the CUSUM and the CvM tests show evidence that it only fit for α set less then or equal 0.459 and 1.24%, respectively, we recommend that first order model is the most appropriate for the LogFe.

Concluding Remark
We established an asymptotically size α test for checking the appropriateness of a spatial regression whose critical region is constructed by using the probability distribution of the statistic expressed as the integral of the competing known regression function with respect to the generalized (h 1 h 2 )-Slepian field. This statistic appears as the limit of the integral of such regression function with respect to the MOSUM process of the matrix of the observations. Other test which is called CUSUM test is also proposed defined in the like way as in the MOSUM test. We show that our tests are more applicable in the case where the design strategy must be incorporated in the analysis. Beside that our test procedures are also tractable in the sense the quantities such as the quantiles of the limiting distribution and the pvalues can be computed analytically. Based on the limit power functions of both tests it is shown that the MOSUM test is asymptotically more sensitive in detecting the change of the model than the CUSUM test. In the present work the result was derived under independently distributed observations. In the future we put our setup in a more general and reasonable situation in which the observations are assumed to be dependent or at least stationary. This approach will be useful for handling the modeling problem of spatial data. In a forthcoming paper of Somayasa and et al. the investigation is extended to multivariate spatial regression model defined in (Somayasa et al., 2015b;Somayasa and Wibawa, 2015;Somayasa et al., 2016) by considering the moving sum process of the multivariate recursive residuals.

Author's Contributions
The corresponding author of this work has contributed in establishing the mathematical derivation of the results and developing the simulation. The second author took part in giving interpretation of the model from the perspective of geostatistics. The third author provided discussion regrading the suitability of the assumed model to the mining data. The fourth author has evaluated the first draft of the paper.

Remark A.2
Some immediate consequences implied by the definition of S l;P can be summarized as follows: s]

Theorem C.2. (Construction of the Set-Indexed Gaussin white Noise)
Let the experimental design To this end we refer the reader to (Alexander and Pyke, 1986;Gaenssler, 1993;Pyke, 1983), establishing the proof.

Proposition C.3
The process W P constitutes a finite signed measure Palmost surely on A 0 . That is there exists a set Ω′⊂ Ω with P(Ω′ C ) = 0 such that ∀ω∈Ω′, W P (ω) is a finite signed measure on A 0 . and it is well-known that a normal distribution model is determined uniquely by its mean and variance, we can conclude that both random variables are equivalent in distribution. That is:

Proof
This means that W P is countably additive P-a.s., finishing the proof.

Proposition C.4
For any f, g∈L 2 (D, P), it holds:   Hence by this equation we further get ∫ D f dW P is normally distributed with mean 0 and variance 2 || || P f . This result immediately implies ∫ D (f-g) dW P is normally distributed with mean 0 and variance 2 || || P f g − . We are done.