Spline Estimator in Multi-Response Nonparametric Regression Model with Unequal Correlation of Errors

Problem statement: In many applications two or more dependent variabl es are observed at several values of the independent variables, such a s at time points. The statistical problems are to e s imate functions that model their dependences on the indep endent variables and to investigate relationships between these functions. Nonparametric regression m odel, especially smoothing splines provide powerful tools to model the functions which draw association of these variables. Approach: Penalized weighted least-squares was used to jointly estimate nonparam et ic functions from contemporaneously correlated data. We apply Generalized Maximum Likelihood (GML) , Generalized Cross Validation (GCV) and leaving-out-one-pair Cross Validation (CV) for esti mating the smoothing parameters, the weighting parameters and the correlation parameter R sults: In this study we formulated the multi-response nonparametric regression model with unequal correla tion of errors and give a theoretical method for both obtaining distribution of the response and est imating the nonparametric function in the model. We also estimate the smoothing parameters, the weighti ng parameters and the correlation parameter simultaneously by applying three methods GML, GCV a nd CV. Conclusion: Distribution of responses is normal. With multiple correlated responses it is better to estimate these functions jointly using t he penalized weighted least-squares.


INTRODUCTION
The functions which draw association of two or more dependent variables are observed at several values of the independent variables, such as at multiple time points, can be modeled by using smoothing spline. There are many writers who have studied spline estimators for estimating regression curve of nonparametric regression models. Kimeldorf and Wahba (1971); Craven and Wahba (1979) and Wahba (1985) proposed original spline estimator to estimates regression curve of smooth data. Cox (1983) and Cox and O'Sullivan (1996) used M-type spline to overcome outliers in nonparametric regression. Wahba (1983) proposed polynomial spline to obtain confidence interval based on posterior covariance function. Wahba (2000) compared between GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Oehlert (1992) and Koenker et al. (1994) introduced relaxed spline and quantile spline, respectively. Budiantara et al. (1997) studied weighted spline estimator in nonparametric regression model with different variance. Wahba (1992) introduced some techniques for spline statistical model building by using reproducing kernel Hilbert spaces. Cardot et al. (2007) studied asymptotic property of smoothing splines estimators in functional linear regression with errors-invariables. Liu et al. (2007) proposed smoothing spline estimation of variance functions. Aydin (2007) showed goodness of spline estimator rather than kernel estimator in estimating nonparametric regression model for gross national product data. All these writers studied spline estimators in case of single response nonparametric models only.
In the real cases, we are frequently faced to the problem in which two or more dependent variables are observed at several values of the independent variables, such as at multiple time points. Multi-response nonparametric regression model provide powerful tools to model the functions which draw association of these variables.
Many authors have considered nonparametric models for multi-response data. Wegman (1981); Miller and Wegman (1987) and Flessler (1991) proposed algorithms for spline smoothing. Wahba (1990) developed the theory of general smoothing splines using reproducing kernel Hilbert spaces. Gooijer et al. (1991) andFrancisco-Fernandez andOpsomer (2005) proposed methods of estimating nonparametric regression models with serially and spatially correlated errors, respectively. Wang et al. (2000) proposed spline smoothing for estimating nonparametric functions from bivariate data. Lestari (2008a) developed spline estimator in biresponse nonparametric regression model with unequal variances of errors, Lestari (2008b) studied penalized weighted least-squares estimator for bivariate nonparametric regression model with correlated errors, Lestari et al. (2010a;2010b) proposed spline approach for estimating regression function of multi-response nonparametric regression model in special cases, i.e., variances and correlations of errors are the same for every response.
All, except Wang et al. (2000); Lestari (2008a;2008b) and Lestari et al. (2010a;2010b), assumed that the covariance matrix is known, which is usually not the case in practice. When the covariance matrix is unknown, it has to be estimated from the data and this can affect the estimates of the smoothing parameters (Wang, 1998).
In this study, we study mathematical statistics methods for obtaining distribution of responses and estimating the nonparametric functions and the parameters in the multi-response nonparametric regression model. Here, we assume that the covariance parameters are unknown and errors of the same responses have the same variances. In addition, errors have different correlation. Based on the multi-response nonparametric regression model given, we estimate multi-response nonparametric regression function by using penalized weighted least-squares. Next, we describe three methods: Generalized Maximum Likelihood (GML), Generalized Cross Validation (GCV) and leaving-out-one-pair Cross Validation (CV) to estimate the smoothing parameters, the weighting parameters and the correlation parameter simultaneously.

MATERIALS AND METHODS
Assume that data {y ki , t ki } follows multi-response nonparametric regression model: Where: k = 1, 2,…,p i = 1, 2,…,n k It means that the i th response of the k th variable y ki is generated by the k th function f k evaluated at the design point t ki plus a random error ε ki . Assume 2 ki k N 0, ε σ for fixed k = 1, 2,…,p and the correlation of errors are different for every i = 1, 2,…,n k , i.e., Corr(ε ki, ε ki ) = ρ i for k ≠ l and zero otherwise. This correlation assumption means that correlation of errors are the same for every response.
There are four cases for this correlation assumption, i.e., (i) case: n 1 <n 2 <…<n p , (ii) case: n 1 >n 2 >…>n p , (iii) case: n 1 = n 2 =…= n p and (iv) case: not all n i <n i+1 (for i = 1, 2,…,p-1), not all n i >n i+1 (for i = 1, 2,…,p-1) and not all n i = n i+1 (for i = 1, 2,…,p-1). In this study, we describe for the first case only, i.e., n 1 <n 2 <…<n p . The three other cases can be verified similarly. In addition, for simplicity of notation, we assume that the domain of the functions are [0,1] and f k is element of Sobolev space W 2 , i.e., f k ∈W 2 = {f: f, f' absolutely continuous, Our methods can be easily extended to the general smoothing spline models where the p domains are arbitrary (thus could be different) and the observations are linear functional instead of evaluations (Wahba, 1985;1990). and Ω qs be a n q ×n s matrix with (i, j) th element equal to ρ i (i = 1, 2,…,n q ) if the i th element of q y and the j th element of s y is a pair and zero otherwise. Therefore, by taking ( )

Estimation nonparametric function of multiresponse nonparametric regression model:
The nonparametric functions f k are estimated by carrying out the following penalized weighted least-squares: The parameters λ k (k = 1, 2,…,p) control the tradeoff between goodness-of-fit and the smoothness of the estimates and are referred to as smoothing parameters.
We extend method as in Wang (1998) (i.e., in case of single-response nonparametric regression model) to multi-response nonparametric regression model with unequal correlation of errors. Let φ v (t) = t ν-1 /(ν-1)! for ν = 1, 2; R 1 (s,t) = k 2 (s)k 2 (t)-k 4 (s-t) where k ν (.) = B ν (.)/ν and B ν (.) is the ν th Bernoulli polynomial. Let  Wang (1998) and Wahba (1985) to the case of multi-response with unequal correlation of errors, we can show that for fixed λ k , γ ij and ρ i , the solution to (4) is: is also a solution to (6). Thus we need to solve simultaneous Eq. 7 for c and d . In fact, is the "hat matrix". Here, A is not symmetric, which is different from the usual independent case.

Estimations of parameters:
We have assumed that the parameters λ k , γ ij and ρ i , are fixed. In practice it is very important to estimate these parameters from the data. Since observations are correlated, popular methods such as the usual Generalized Maximum Likelihood (GML) method and the Generalized Cross Validation (GCV) method may underestimate the smoothing parameters (Wang, 1998). In this study we propose the following three methods to estimate the smoothing parameters λ k , the weighting parameters r k and γ ij and the correlation parameter ρ i simultaneously, i.e., an extension of the GML method based on a Bayesian model; an extension of the GCV method and leavingout-one-pair cross validation. Wang (1998) proposed the GML and GCV methods for correlated observations with one smoothing parameter. Wang et al. (2000) proposed the GML and GCV methods for correlated observations with two smoothing parameters. In multi-response (with p responses) nonparametric regression model, there are p smoothing parameters which need to be estimated simultaneously together with the covariance parameters. Following an extension of derivation, we extend the GML and GCV in Wang (1998); Wang et al. (2000) and Lestari et al. (2010a;2010b), to the case of multi-response with unequal correlation of errors as follows.
The GML estimates of λ k , γ ij , r k and ρ i are minimizers of the following GML function: Where: n = n 1 +n 2 +…+n p det + = The product of the nonzero eigen values z = T 2 Q y The minimizers of M (λ k , γ ij , r k , ρ i ) are called GML estimates.
The GCV estimates of λ k , γ ij , r k and ρi are minimizers of the following GCV function: In the following we propose a cross validation method based on leaving-out-one-pair procedure. Suppose there are a total of N (N≥max{n 1 , n 2 ,…,n p }) distinct time points and thus N pairs of observations. Any one observation in a pair may be missing. These pairs are numbered from 1 to N. We use the following notation: superscripts (i) to denote the collection of elements corresponding to the i th pair; superscripts [i] to denote the collection of elements after deleting the i th pair; superscripts {i} to denote solution of f k without the i th pair. When one observation in a pair is missing, superscripts indicate a single observation instead of a pair. The solutions to: where, the first inequality holds because after switching rows and columns, we have: As a consequence of this lemma, we do not need to solve separate minimization problems (13) for each deleting-one-pair set. All we need to do is to solve the following equations: Note that (16) is exactly the same as the "leavingout-one" lemma in the independent case.

The distribution of vector of responses y is
Multivariate Normal with mean f and variance θW −1 . General smoothing spline models provide flexibility for estimating nonparametric functions and are widely used in many areas. With multiple correlated responses it is better to estimate these functions jointly using the penalized weighted least-squares.