SMOOTHING SPLINE IN SEMIPARAMETRIC ADDITIVE REGRESSION MODEL WITH BAYESIAN APPROACH

Semiparametric additive regression model is a combi nation of parametric and nonparametric regression models. The parametric components are not linear bu t following a polynomial pattern, while the nonparametric components are unknown pattern and as sumed to be contained in the Sobolev space. The nonparametric components can be approximated by smo othing spline functions. In the development of smoothing spline, the classical statistical approac h cannot be applied for solving the inference probl em such as constructing confidence intervals for the regres sion curve. To construct confidence interval of smo othing spline curve in the semiparametric additive regress ion model, we propose to use Bayesian approach, by assuming improper Gaussian distribution for prior d istribution in nonparametric components and multivariate normal distribution for parametric com ponents. In this study, we obtain parameter estimat ors for parametric component and smoothing spline estim ators for the nonparametric component in semiparametric additive regression model. Moreover, we also develop a smoothing parameters selection method simultaneously using Generalized Maximum Lik elihood (GML) and confidence intervals for the parameters of the parametric component and the smoo thing spline functions of the nonparametric compone nt using Bayesian approach. By computing each posterio r mean and posterior variance of parametric compone nt parameters and smoothing spline functions, confiden ce i tervals can be constructed for the parametric component parameters and confidence interval smooth ing spline functions for nonparametric components i  semiparametric additive regression models. We creat e R-code to implement estimation model and inferenc e procedure. Our simulation studies reveal estimation and inference method perform reasonably well.


INTRODUCTION
In regression model, there are some components which have sufficient information to describe the relationship pattern between the predictors and the response variables. However, there are also vague or nuisance components. Hence, in this study, the semiparametric additive regression model is used to overcome these difficulties. This model is a combination of parametric and nonparametric regression models. The parametric components are not linear but follow a polynomial pattern while the nonparametric components are unknown pattern and assumed to be contained in the Sobolev space. The nonparametric component can be approximated by using functions such as spline, polynomial local or kernel. Among these approximations, spline function has high flexibility and capability to handle the data with changing behavior in certain sub-intervals (Eubank, 1999). It has also been pointed out by (Liang, 2006;Aydin, 2008)

that compares
Science Publications JMSS the smoothing spline technique with kernel in semiparametric and nonparametric regression.
There are three common approaches for estimations the regression function by using spline in semiparametric regression model i.e., are regression spline (truncated spline, cubic spline, B-spline), penalized spline (pspline) and smoothing spline. For penalized spline and regression spline, we need to be careful in determining the number and location of the knots, whereas the smoothing spline does not require the selection of knots. Furthermore, the performance of smoothing spline in semiparametric regression model is better and more flexible than the penalized and regression spline (Aydin and Tuzemen, 2010). Smoothing spline estimator can be obtained by classical approaches such as Penalized Least Square (PLS), Penalized Maximum Likelihood or Penalized Likelihood. Beside the classical approach, these estimators in semiparametric regression can also be estimated by Bayesian approach (Wang, 2011).
Penalized and regression spline research in regression model of semiparametric additive with Bayesian approach has been developed by many experts. Among them, for instance: (Wong and Kohn, 1996;Li, 2000;Smith et al., 2000;Kandala et al., 2001;Panagiotelis and Smith, 2008;Ryu et al., 2011) were those who use regression spline with Bayesian approach, while (Lang and Brezger, 2004;Jerak and Wagner, 2006;Nott, 2006;Costa, 2008;Marley and Wand, 2010;Shen, 2011) were those who use p-spline with Bayesian approach. Wang (2011) used a smoothing spline in semiparametric regression model whose parametric components are linear patterned with Bayesian approach. However estimation of regression function by smoothing spline in semiparametric additive regression model which parametric components are not linear (polynomial) using Bayesian approach have not existed yet.
In the development of smoothing spline, the classical statistical approach cannot be applied for solving the inference problem such as constructing a confidence intervals for the regression curve. Therefore, some researchers use Bayesian approach for building such confidence intervals for smoothing spline function. (Wahba, 1983;Nychka, 1988) have used a Bayesian approach to construct confidence intervals of smoothing spline on nonparametric model. In this study we developed approaches about the smoothing spline in semiparametric additive regression models with parametric components which are not linear (polynomial) by using Bayesian approach. We also developed some methods for selecting optimal smoothing parameters simultaneously in semiparametric additive regression models as well as building confidence intervals for the parametric component parameters and smoothing spline functions. We proposed this method with simulation data.

MATERIALS AND METHODS
This chapter discusses some theories used for building smoothing spline estimators in semiparametric additive regression model with Bayesian approach.

Semiparametric Additive Regression Model
Suppose sampling observations (x 1j ,...,x pj , z 1j ,...,z qj , y j ) with y j as response variable and j = 1,2,…,n shows the amount of observations. X ij are predictor variables for i = 1,2,…,p which have not linier patterns to response variable but follow the polynomial pattern. z kj are predictor variables for k = 1,2,…,q which have unknown relationship pattern with response variable. The relationship between (x 1j ,...,x pj ), (z 1j ,...,z qj ) and y j are modeled by semiparametric additive regression: The parameter γ hi is parameter vector of the unknown parametric components for h = 0,1,2,…,r; i = 1,2,…,p. Random error ε j are assumed mutually independent and normally distributed with zero mean and variance σ 2 . According to Wang (2011) the shape of regression curve f k is unknown and assumed to be contained in the Sobolev space: The Equation 1 can be written as follow: for i=1,2,…,p. Nonparametric regression curve f k is estimated by PLS method by minimizing: Science Publications
To get the smoothing spline estimators in semiparametric additive regression model, Wang (2011) used an extension from Wahba (1990). The form of general spline function used is: can be written in matrix form f k = T k α k +θ k V k β k for k=1,2,...,q and Equation 2 can be solved by minimizing: ..,α ) , β = (β 1 ,…,β n ) T , S = (X T), X = (X 1 ,…,X p ), T = (T 1 ,…T q ) and V θ = θ 1 V 1 +…+θ q V q then Equation 4 can be written as: By taking partially derivative of Equation 5 with respect to µ and β then the results are to be equal to zero, we obtain: with I is the identity matrix and M = V θ + nλI. Based on Equation 6 and 7, we obtain estimators for smoothing spline in semiparametric multivariabel regression model that can be expressed as follows

Parameter Estimation
In Bayesian approach, selection of the prior distribution is very important. The prior distribution used in this study is restricted to improper Gaussian prior distribution for the nonparametric component and multivariate normal distribution for parametric components. In Bayesian approach case, the point estimation is obtained from posterior mean and the interval estimation is obtained from its posterior variance.
Given sampling observation (x ij , z kj , y j ), j = 1,…n; i = 1,…,p; k = 1,…,q can be obtained from stochastic process {y(x, z), x, z ∈[a,b]} and follows the model (1). Prior distribution f k is defined as Equation 8: where, W(u) is Weiner process with zero mean and reproducing kernel covariance θ k V k . Moreover, α and g k (z k ) are mutually independent. Hence, {h(x,z),x,z ∈[a,b]} have prior distribution of improper: Gaussian process with zero mean and Cov(ε(z j ),ε(z 1 )) = σ 2 for j=l and zero for others. Let y, h and ε given as Gaussian random vector with zero mean and follow model (1) Hence, by using a standard result on multivariate normal distribution e.g., Johnson and Wichern (2001) (Result 4.6), the conditional distribution of m given y and X is normal with each of mean and covariance Equation 10 and 11: and: with λ = σ 2 /nη. Equation 9 can be written in matrix form ∑ where µ~N(0,πI), π . → ∞ Assumed that µ with g k (z k ) are mutually independent, then we obtained Using quadratic loss function, we find the estimator bayes is the posterior mean h, hence we get:

Smoothing Parameters Selection Method
Smoothing spline estimators depend on the smoothing parameter. Hence, the smoothing parameters selection are crucial for the performance of smoothing spline function estimates. A selection method of the smoothing parameters (λ 1 ,…, λ q ) for smoothing spline estimators in semiparametric additive regression model using Bayesian approach, that is Generalized Maximum Likelihood (GML) is given as follows. The basic idea of using GML method was firstly done by Wahba (1985) in a nonparametric regression model. If given w 1 and w 2 with decompotition as follows:

JMSS
Determining both of these distributions, we found that only w 1 contain the smoothing parameters λ 1 ,…, λ q . Based on the distribution of w 1 , the log likelihood function can be obtained as follows: The log likelihood function gives the maximum likelihood estimator, which is: By subsitution of η into log likelihood function, we get: with K 1 and K 2 are constants which not depend on λ 1 ,…,λ q and η. Maximizing log L(λ 1 ,…,λ q |w 1 ) is equivalent to minimizing: (F MF) ∑ ∑ where T 1 w = F y . The values of λ 1 ,…,λ q are optimal by minimizing of GML(λ 1 ,…,λ q ).

Confidence Interval
One way for constructing confidence intervals for semiparametric estimates is bootstrap and Bayesian approach. The disadvantage of the bootstrap confidence interval is that they are more computationally intensive. Hence, to compute the confidence interval of the smoothing spline functions f k and the parameter γ in semiparametric additive regression model, we can use Bayesian approach. Based on the prior (9), Equation 11 and 12, we obtain: Var(m | y) = η s s + θ ψ -( s S + θ ψ ) If the limit value of posterior variance h is taken for ζ→∞ we find that: Therefore, by calculating each posterior mean and posterior variance of parameters γ and k f , we can construct confidence intervals for the parameters γ and confidence intervals for smoothing spline functions k f in semiparametric additive regression model.

Simulation Study
In the simulations, we generated our data from the semiparametric additive regression model in (1) with n = 100, p = 4, r = 2, q = 2 and m = 2. For the parametric part, we set parameter of γ = (0.0, 1.0, 0.8, 1.4, 0.6, 1.2, 0.9, 1.1, 1.2) T . The x i 's were generated from the multivariate normal distribution with zero mean and Cov(x ij , x ik ) = 0.5 |j−k| . For the nonparametric part, the true functions were set to be: The z ik 's were generated independently from the uniform distribution on [0, 1]. The random errors ε j were generated from the normal distribution with zero mean and standard deviation σ = 0.9.
Based on GML method, we create R-code for choosing smoothing parameters simultaneously. The optimum smoothing parameter is used to obtain parameter estimator for parametric components and smoothing spline estimator for the nonparametric component in semiparametric additive regression model. The GML values with a number of different smoothing parameters for the simulation data is depicted in Table 1. We can see the changing of GML values from small (line 1) to large smoothing parameter (line 8). According to the eight smoothing parameters combination that are applied in the model, the optimal smoothing parameters are λ 1 = 2.083E-05 and λ 2 = 5.000E-05 having the smallest GML value of 48.01812 (line 4). Next, using the optimal smoothing parameter, the estimation of the parametric component γ and the 95% confidence interval of the parametric component appropriate can be seen from Table 2. Figure 1 shows the estimation of smoothing spline function as well as the 95% confidence interval for σ = 0.9. The rounded lines are the true functions, solid lines represent the smoothing spline curves with optimal smoothing parameter and the dashed lines denote the confidence interval for lower and upper smoothing spline. If we compare it with the smoothing spline function estimates as well as the 95% confidence interval for σ = 1.0 and σ = 0.6 in Fig. 2, it is seen that the smoothing spline function estimation for σ = 0.6 is closer to the true function and also the perform reasonably well. It can also be seen from the Mean Square Error (MSE) for for three different error standar deviation is used in Table 3.

DISCUSSION
The prior distributions used in the model in this study are improper Gaussian distribution for nonparametric components and multivariate normal distribution for parametric components. The smoothing spline estimators were obtained by using Bayesian approach. Our result is identical to smoothing spline estimator obtained by the PLS approach. The confidence intervals for the parameters of the parametric component and confidence intervals for smoothing spline functions of nonparametric component in semiparametric additive regression model can be constructed through Bayesian approach. By using the Bayesian approach, we obtain the GML method for selecting optimal smoothing parameters simultaneously where the shape of the smoothing parameters is fixed.

CONCLUSION
The smoothing spline in semiparametric additive regression model with Bayesian approach is a development of Bayesian smoothing spline for nonparametric component. Using Bayesian approach we obtain parametric component parameter estimators and smoothing spline estimators for nonparametric components and smoothing parameters selection method simultaneously with GML in semiparametric additive regression model. In addition, by computing each posterior mean and posterior variance of parameters γ and k f the confidence intervals can be constructed for the parametric component parameters and confidence interval smoothing spline functions for nonparametric components in semiparametric additive regression models. Numerical example shows that estimation and inference method can be applied well using simulation data. The problem remaining is to apply this model in real life problem. The further study may consider to estimate the smoothing parameters simultaneously through Markov Chain Monte Carlo (MCMC) and to use other prior distribution in semiparametric additive regression models.

ACKNOWLEDGEMENT
The first research would like to thanks The Indonesian Central Bureau of Statistics (BPS) Indonesia