Spline Estimator for Bi-Responses and Multi-Predictors Nonparametric Regression Model in Case of Longitudinal Data

Corresponding Author: Adji Achmad Rinaldo Department of Mathematics, Faculty of Mathematics and Natural Sciences, University of Brawijaya, Jl Veteran, Malang 65111, Indonesia Email: fernandes@staff.ub.ac.id Abstract: Nonparametric regression approach is used when the shape of the curve regression is unknown. The spline estimator approach for longitudinal data can accommodate the correlation between observations within the same subject, which is not found in the cross-section data, so that the autocorrelation assumption problem can be resolved. On the other hand, with bi-responses approach, it will accommodate any correlation between each response variables. The purposes of this study are (1) to obtain the function form of the nonparametric bi-responses and multipredictorsregression on longitudinal data, (2) to obtain the spline estimator in estimating the nonparametric bi-responses and multipredictorsregression curve on longitudinal data and (3) to apply the spline estimator in estimating the curve of nonparametric bi-responses and multi-predictorsregression on longitudinal data. Bi-responses and multipredictors nonparametric regression of the spline estimator on longitudinal data which meet the criteria of minimizing Penalized Weighted Least Square (PWLS). Application of data set (Patient in Pulmonary Tuberculosis) result shows that the spline estimator can be applied which gives the value of R 2 of 97.77%.


Introduction
Spline estimator of Nonparametric Regression can be used to estimate functions which represent association of two responses variables are observed at several values of multi-predicors variable. One of the uses of Spline Estimator is in the analysis of longitudinal data, which is a combination of crosssection data and time-series, that is the observations which are made as many as r mutually independent subjects (cross-section) with each subject is repeatedly observed in n period of time (time-series) and between observations within the same subjectswhich are correlated (Diggle et al., 2013).
In the applications in various fields, problems involving more than one correlated response variables are often encountered, so the regression model developed is bi-responses and multi-predictors as follows: The first predictor variable 1it x as the observation time (design time points) (Wu and Zhang, 2006) and f is the regression curve relationship between the predictor variables with the response variable y for toi subject. The curve f in nonparametric regression approach is used when the shape of the curve f is unknown (Eubank, 1999).
Nonparametric regression is a regression approach suitable for the data patterns of unknown shape, or there is not complete information about previous data patterns (Budiantara et al., 1997;Eubank, 1999). In nonparametric regression approach, the estimation modelshape of the curve regression is determined based on the pattern of existing data. Some nonparametric regression approaches are: the kernel (Hardle, 1990), spline (Budiantara et al., 1997;Wahba, 1990) and wavelets (Antoniadis and Sapatinas, 2007). Spline estimator is one of the most commonly used estimator in nonparametric regression because it has a good visual interpretation, high flexibility and able to handle smooth function characters (Eubank, 1999). Regression curve f in spline estimator for longitudinal data used is assumed smooth, meaning that it is contained in a certain function space, especially Sobolev space or as written (Wu and Zhang, 2006).
Spline estimator in the bi-responses and multipredictors nonparametric regression model for longitudinal data is the generalization of the spline estimator in single-response nonparametric regression model for longitudinal data (Chen and Wang, 2011;Lin et al., 2006), as well as spline estimator in the multi-response nonparametric regression model for cross-section data (Lestari et al., 2010;Wang et al., 2000). The spline estimator approach for longitudinal data can accommodate the correlation between observations within the same subject, which is not found in the crosssection data, so that the autocorrelation assumption problem can be resolved. On the other hand, with biresponses approach, it will accommodate any correlation between each response variable. Fernandes et al. (2014a) has been solved the RKHS for bi-responses and Fernandes et al. (2014b) for multi-predictors.
Reproducing Kernel Hilbert Space (RKHS) play a central role in Penalized Regression as a form and estimator function of the model. Hilbert spaces that display certain properties on certain linear operators are RKHS. The function f the unknown function and assumed smooth in the sense of being contained in the space H. (Wahba, 1990). For the purposes of f ɶ estimation, RKHS approach with completes the Penalized Weighted Least Square (PWLS). Based on the above background, the purposes of this study are (1) to obtain the function form of the nonparametric biresponses and multi-predictors regression on longitudinal data, (2) to obtain the spline estimator in estimating the nonparametric bi-responses and multipredictors regression curve on longitudinal data, (3) to apply the spline estimator in estimating the curve of nonparametric bi-responses and multi-predictors regression on longitudinal, applicationin data with patient of Pulmonary Tuberculosis from Chozin (2009).

Materials and Methods
Suppose we want to find a function f ki in equation (1) (2) And we called in Sobolev Space (Wahba, 1990). If we restrict the boundary value in [0, 1] with transformation, then the value of f ki (0)=0 is not really necessary, but simplifies the presentation of derivative values of ' ki f or '' ki f . Defining an inner product of 2 W m by: Implies a norm over the space 2 W m that is small for "smoth" functions (Wahba, 1990). To address the interpolation problem, is given: Thus interpolator f ki satisfies a system of equation (5), namely: And the smoothest function ki f satisfies an equation (6): Note that: And define the function which turns out to be a reproducing kernel.
Bi-responses and multi-predictors nonparametric regression model for longitudinal data which involves r subject on n observation in each subject in Equation (1). The error random variable ε kit assumed to be normally distributed to N-variat (N = 2rn), with zero mean and covariance matrix W -1 is follows: The spline approach generally defines f ki in equation (1) in form of an unknown regression curve, but f ki is only assumed as smooth, in a sense of being contained in a specified function space, especially Sobolev space or written as For a positive integer m (polynomial order, m=2 linear, m=3 quadratic, m=4 cubic). Optimization Penalized Weighted Least Square (PWLS) involves weighting in form of random error variancecovariance matrix as has been described in equation (9). To obtain the estimate of the regression curve f ki using optimization PWLS that is the completion of optimization as follows (Eubank, 1999): The PWLS optimization in equation (11) in addition to considering the weight, also considers the use of 2r smoothing parameter λ ki as a controller between the goodness of fit (the first segment) and the roughness penalty (second segment). The covariance matrix in (9) is the weighted in PWLS (11), which accommodate (1) the correlation between observations within the same subject and (2) accommodate any correlation between each response variable.

Result
Assume that the data follow the bi-responses and multi-predictors nonparametric regression models for longitudinal data: is the inknown function and is assumed smooth in a sense of contained in space H. In order to obtain the function (12), Theorem 1 is presented in the following.

Theorem 1
If given the data pairs following 1 x y y , the bi-responses and multi-predictors nonparametric regression model for longitudinal data as given in the equation (1), then the form of the bi-responses and multipredictors nonparametric regression function for longitudinal data is: ( ) Proof of Theorem 1: Hilbert spaces that display certain properties on certain linear operators are RKHS. The function is a limited linear function in the space H and function ki f H ∈ , obtained: (15) can be expressed as: The description of the Equation (16) For longitudinal, each observations in same subject are time ordered. The t-observation only depend on (t-1), (t-2), ..., (t-n+1) and not depend on (t+1) Then from the same way, it was obtained the result for: T is the matrix of order (2rn)×(2rm) and Vis the matrix of order (2rn)×(2rn) as follows: In this case, ki For [0,1] x∈ , solving this Equation (22) as: Based on the Theorem 1, it was used as the basis to obtain estimates of the spline as shown in the following theorem 2: Theorem 2: The spline estimator in the bi-responses and multipredictors nonparametric regression for longitudinal data. If given the data pairs following the bi-responses nonparametric regression models involving a multipredictor variable on longitudinal data that meets the functional form of bi-responsesand multi-predictors nonparametric regression for longitudinal data as presented in the Theorem 1 and with the assumption (presented in the equation (9) so the spline estimator minimizing PWLS in the equation (11) is

Proof of Theorem 2:
Given the equation (19), then regression curve estimator f ɶ will be obtained if T and V are given in Equation (20) and (21) With constrain: used was Sobolev order-2 space which is defined as follows: Based on that space, Norm for each function Optimization with constrain on the equation (23) can be written as: [ , ] 1,2; 1,2,.., 1,2; 1,2,.., With constrain Weighed optimization with constrain (24) is equivalent to completing PWLS optimization with the equation (11). To complete the optimization, first penalty component must be described: Based on the equation (25), penaltyvalue gained: The completion of the optimization (28), was obtained by partially derivating ( , ) Q c d ɶ ɶ against c ɶ then the result was equated to zero and gave the result: Then partial derivatif against d ɶ then the result was equated to zero, it gave the result of: Because the Equation 28, it was obtained the equation: As a result, it was obtained the equation: The equation (26) was multiply with W it was obtained: Based on the Equation 32 and 33, it was obtained the estimator for bi-responses and multi-predictors nonparametric regression curve for longitudinal data involving a single predictor variable as follows: The application using data with patient of Pulmonary Tuberculosis from Chozin (2009). The data coming from 4 patients (s1,s2,s3,s4), with two response variables: suPAR (y 1 ) and Monocite (y 2 ) and predictor variables: Design time point (x 1 ). Figure 1 shows the plot of each responsesand shows that the correlation between the response variables.
The Plot of predictor and responses to be given in Fig. 2. From the plot in Fig. 2 shows that no form of a particular pattern (the pattern was less clear form) between the predictor variables x with response variabley each i subject. The results of the smoothing parameter λ ki based on the minimum value of GCV. The optimizations results showed that the coefficient of determination 97.77%. Figure 3 shows the results of Spline Estimator for bi-response multi-predictor longitudinal data that provide the minimum value of GCV with m = 4 (quadratic spline). From data set given the optimal solution, the curve can describe 97.77% of the variance of the original data.

Conclusion
Based on the results of the study presented on the previous part, several things can be concluded as follows: Bi-responses and multi-predictors nonparametric regression mdel on longitudinal data on the equation