Univariate Generalized Additive Models for Simulated Stationary and Non-Stationary Generalized Pareto Distribution

Corresponding Author: Mostafa Behzadi Department of Mathematics, Faculty of Science, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia Email: mostafabehzady@gmail.com Abstract: Generalized additive models as a predictor in regression approaches, are made up over cubic spline basis and penalized regression splines. Despite of linear predictor in GLM, generalized additive models use a sum of smooth functions of covariates as a predictor. The data which are used in this study have generalized Pareto distribution and have been simulated by inversion method. The data are generated in two types, the stationary case and the non-stationary case. The method of root mean square of errors as a method of measurement is used for comparison between power of predictions which are based on penalized regression splines as a method in univariate generalized additive models and linear regression based on maximum likelihood estimation. The finding of this research illustrates that the amount of accuracy of estimation of parameter of location in UGAM approach as an alternative promising of modelling through each specialized GPD's models, has less RMSE in compare with MLE.


Introduction
During early of 1990, generalized Additive models, GAMs, by attempts of Hastie and Tibshirani have been developed (Hastie and Tibshirani, 1990). The construction of GAM, is relied on back-fitting with linear smoothers. Although this model seems to be flexible, but, in practice, some difficulties like how to inference and how to select the model be revealed. To get the solution of these difficulties, the sensitive mathematical work of Wahba and her colleagues (Wahba, 1990) on generalized spline smoothing, GSS, provided a consistent and precise framework to inference. Smoothing splines, are basis for model selection (Gu and Wahba, 1991a) with GAMs (Gu and Wahba, 1991b). Computationally, the selection of these models are very expensive. Applying penalized regression splines, are a "middle approach" (Wahba and Wendelberger, 1980) to build GAMs (Eilers and Marx, 1996;Wood, 2006). Recall that GAMs is based on smooth functions and has ability to predict. Therefore, for better prediction, a basis and "wiggliness" penalty for every smooth function in the model is selected. Univariate generalized additive models, provide a flexible and smooth approach to distinguish and identify the non-linearity's covariate effects in a several types of modelling situations and circumstances (Wood, 2006).
The simulated data are generated based on inversion method, in this study. It has been used from R software to program the approaches, problems and solutions. Generalized pareto distribution is belong to a family of continuous probability distributions and often is used to model the tails of another distributions. At first, this paper will pay attention to the root of origination of univariate generalized additive models and then will go to concentrate on parameter of location. The goal is to display the ability of amount of accuracy of estimation of parameter of location. To get the goal, the method of penalized regression spline in univariate generalized additive models in stationary and non-stationary GPD data are compared in contrast of another approach, MLE.
Section 2, presents the basic materials in details of structure of GAM. Section 3, deals with different types of simulated GPD data and their inversions. The maximum likelihood estimation of each type of data are calculated and provided in section 3, as well. Section 4, display the implementation of GAM and goes to the details of this method like cubic spline and penalized regression spline. Section 5, is consist of results of this research. The results are included some figures and related tables. The conclusion is placed in section 6.

Basic Materials
GAM has resemblance to a GLM in which a linear combination of scatter-plot smoothers is replaced instead of linear combination of explanatory variables. The general structure of a GAM can be illustrated as bellow: where, i X * , is a row of the model matrix for any strictly parametric model components, θ, is the corresponding parameter vector and f j , are smooth functions of the covariates, x k (Wood, 2006). In addition: where, Y i , is a dependent variable which is distributed as some of the exponential family distribution. The indication of smooth function is best introduced with respect to a model which has one smooth function of one covariate. This model is: where, y i , is an outcome or response variable and x i is covariate, f is a smooth function and 2 (0, ) i N ε σ ∼ random variables, for i = 1,2,…,n (Wood, 2006).
To show f, as a simple predictor, one can choose a known basis function.
Suppose that b j (x) is the jth basis function, then f, is modelled to have a formula (Wood, 2006) for a number of unknown parameters which should be estimated. By replacement of Equation 3 into y i = f(x i ) + ε i , a linear model will be constructed.
A cubic spline can be applied as a representative for univariate functions. A cubic spline is made of combining of cubic polynomials which are connected to each other to build a special curve. This new curve is continuous in value, as well as, in first and second derivatives (Wood, 2000). The knots of the splines are illustrated by the points where the sections have connection (Hastie and Tibshirani, 1987). What approach is applied, let the location of knots be marked by { : 1,..., 2} (Wood, 2006). For a formal spline, wherever there is some datum, the knots occur. In general, the knots could either be spread among the range of observed x, or put it at quantiles of the distribution of unique x values (Hastie and Tibshirani, 1987). With regard to the location of knots, many alternatives can occur (Gu, 2002). Let consider basis b 1 (x) = 1, b 2 (x) = x and b i +2 = R(x,x * ) for i = 1,…, q-2, where: In order to making function of "f", by applying the cubic spline basis Equation 2, will be a linear model: Y = Xβ +ε, where the ith row of the model matrix is: therefore, the model could be approximated by least squares (Wood, 2002;Gu and Kim, 2002).
Instead of fitting the model by minimizing: where, Y, is a dependent variable and X, is model matrix.
We are able to minimize: in which λ, displays the smooth parameter in where the integrated square of second derivative penalizes models that are too wiggly. Always, the penalty can be written as a quadratic form in β, i.e.: where, S, is a matrix of known coefficients. Here, the form of spline basis is a little complicated and this complication proves its worth. Notice that: while, the first two rows and columns of S are 0. Accordingly, indeed, the penalized regression spline fitting problem is to minimize: with regard to β. In practice, the explanations above, is not useful for computation. The method of orthogonal matrix offers greater numerical stability. In practical solution, the bellow formula can be applied: where B , is any square root of the matrix S, note that B T B = S (Wood, 2006). This research is used from generalized cross validation, GCV, as an evaluator method to trade off the validity and accuracy of modelling. GCV has computational benefits, in addition, has benefits in terms of invariance (Wahba, 1990;Gu, 1992).

Simulated Data
This section illustrates the special GPD distribution function briefly. The GPD distribution function is belong to exponential distributions functions family and has three parameters, location: µ∈R, scale: σ>0 and shape ξ∈R (Arnold, 2011). If the parameters of shape and location be equal to zero, then generalized Pareto distribution is equal to exponential distribution. On the other hand, the GPD is equal to Pareto distribution where shape is considered to be ξ>0 and location is regarded to be σ µ ξ = (Coles et al., 2001). This paper deals with two types of GPD data:

Stationary Case
The meaning of stationary case of GPD distribution function is the function which has determined parameters. In other words, all parameters such as µ, σ and ξ are equal to a specified value, in their defined domain and are constant. In order to generate simulated stationary data, the inversion method to simulate should be applied computing the inversion function of stationary case of GPD distribution function makes an ability to produce the stationary data: therefore, the inverse of this function will be: to calculate the function of MLE, the production of p.d.f consequently: Calculation of maximum likelihood of stationary case of GPD distribution function gives an opportunity to estimate of parameters and obtains base on probability distribution function of GPD:

Non-stationary case
Whereas the parameter of location in this case of GPD, µ t , is equal to t σ ξ , then, the non-stationary case of GPD has a trend in its mean while scattering of data is increased: The function after replacement is: and the inversion of this function is obtained: The most important usage of maximum likelihood function is estimation of unknown parameters. Indeed, the duty of this function is to estimate the two parameters included σ and ξ. By substitution, the parameter of location automatically is calculated, To get the goal: suppose that θ, is representative of all parameters, therefore, in production: consequently:

Implementation of Univariate Generalized Additive Models for Each Set of Data
The construction of GAM is based on cubic spline basis and penalized regression spline. In fact, in this structure, polynomial bases are very useful for positions which interest concentrates on attributes of f in the neighborhood of a single determined point (Gu, 1992). However, when the query of interest relates to f on its whole domain ([0,1]), the polynomial bases have some difficulties on the border. The splines bases fulfill well in this circumstances, greatly, because spline basis can be illustrated to have good approximation theoretic properties.
First of all, it should be written a function which has ability to take a sequence of an array of x values to make a model matrix for the spline. Secondly, the only thing that remains is to select a sequence of knots, i x * . Briefly, it is displayed how to deal with these steps inside the programming: • get the knots • get the penalty matrices • get the model matrix

Stationary Data
The feature of histogram and density function of GPD in stationary case has been displayed in Fig. 1a. As it is observed in Fig. 1b, there are six predicted colored lines. Each sub-figure represents a number of specified knots which could describe the function of GAM only for parameter of location. The fitting of the model seems enough reliable and acceptable. Adjusting and qualifying the basis dimension, q, is arbitrary. In other words, the selection of degree of model smoothness which is the number of knots +2, is basically optional (Wood, 2006;Durrleman and Simon, 1989). The arbitrary selection needs to more research and means that if a satisfactory theory for modelling with smooth function is to be extended, the number of knots must be shown. The smoothing parameter, λ, controls the specification between model fit and model smoothness (Kim and Gu, 2004). If λ = 0, means that there is no penalty and if λ→∞, means that something like a straight line is going to estimate f (Wood and Augustin, 2002;Eilers and Marx, 2010).
To apply the estimated function effectively, selection of q, j x * and λ are very significant. Hence, it must be considered to an important matter which emphasizes that the selecting of number of knots should be large enough, which basis functions could provide enough flexibility to illustrate f(x). Accordingly, neither the precise of knots locations nor the exact selection of number of knots has a big influence on the model fit. As regards, the selected λ, has a great role in determining model flexibility and finally the approximated form of ˆ( ). f x By determining λ = 10 −8 as a default to originate the program, thenλ , with respect to minimization is obtained by using this formula (Wood and Augustin, 2002): According to Equation 14, λ for stationary data is obtained: Note that the program begins by a chosen value as a default for λ, empirically. In continue, according to the GCV function, the fitted model is depicted. In Fig. 1c the minimum of GCV function has occurred in point of 43 and V (i) is equal to 2.396. Indeed, this point is the global landmark of this function. According to this obtained landmark, it is possible to guess that the fitted line has a big curve along itself. The numbers of curves during the fitted line are a reaction of global and local landmarks of GCV function. Figure 1d is the reference for application of minimum GCV function.

Non-Stationary Data
The non-stationary of GPD distribution function has a trend in its parameter of location, µ t (Hamilton, 1994). This parameter, itself, is a new function and has two parameters such as σ and ξ. In this case σ t is equal to κt, therefore the µ t will be equal to t κ ξ . The shape parameter can take any value in the range of real number and the scale parameter can take any nonnegative real number. As the µ t shows itself, it is a linear function without intercept. In this function the slope is the t κ ξ . Figure 1h proves that there is no intercept for the predicted lines based on this case of GPD non-stationary data in the origin of the axes.
With respect to the properties of µ t , the data have two specifications simultaneously, the first specification is along the coefficient of slope and the second one is scattering generally more and more by increasing the value of "t" in each step (Dominici et al., 2002). The histogram and density function of GPD non-stationary function is illustrated in Fig. 1e. The six PRS fitted models are located in Fig. 1f. With refer to Fig. 1f, each fitted model from these six depicted PRS, displays a determined number of knots. Based on Equation 14 the computing of λ leads to 0.2479.

Results and Discussions
The result of this paper is gathered concisely in Table 1 and 2. The explanation of each type of data with consideration to related figure is categorized in below respectively: This panel demonstrates GPD stationary data. An illustration of the fitting models to predict the parameter of location based on estimated GAM and MLE's method are located in Fig. 1d. The blue line has been depicted according to MLE's estimations and the red one has been drawn based on estimation of GAM. These data have got homoscedasticity because the data scattered with the same variance. In Table 3, has been listed some various sample sizes with n = 20, 100 and 500. The column of GPD stationary data makes the ability to test the comparison of RMSE between two estimators. In each sample size, the RMSE which is calculated by GAM's estimation is less than MLE's method. As it is observed, the structure of GAM helps to predict the parameter of location better among this type of data. By investigating the stationary column in Table 3, after increasing the sample size from 100 to 500, a decrease has happened in amount of values of GAM's estimation for obtaining RMSE. . Thus, these data have got heteroscedasticity, because scattered with different variances. With respect to the Fig. 1h, the red line or GAM's prediction, is almost a straight line as the blue prediction which is depicted based on MLE. As it is realized from this panel, the prediction based on GAM has been depicted among the data, but the prediction based on MLE, has been drawn near the lower threshold of these data. Hence, by looking at this phenomenon and before searching Table  3, it is possible to guess that the estimation by GAM has less RMSE for prediction of parameter of location, rather than MLE's method.   The column which displays the non-stationary case in Table 3, is contain the calculated RMSE for different sample sizes. According to Table 3, by increasing the number of data in each sample size, the RMSE based on MLE's method, increases much more than the RMSE which has been calculated according to GAM's estimation.
In estimation based on MLE in Table 3, there is just growth in value of RMSE with raising the sample size, but in estimation according to GAM, for n = 500, there is a decrease in calculated RMSE rather than n = 100.

Conclusion
The concentration and focus of studying in this study is over parameter of location. The results of this paper illustrate the extension of conception of generalized additive models in order to evaluate the accuracy of estimation of parameter of location. This development is displayed by related figures. In addition, the parameter of location is monitored by method of MLE, as well. In order to compare the amount of accuracy of estimation of parameter of location 1 is prepared. According to RMSE values, the amount of accuracy of estimation of parameter of location based on PRS in GAM is better than Method of MLE via GPD models.
In general terms, the finding illustrate that the estimation based on GAM has less root mean square of errors, hence, the estimation based on GAM is more powerful than findings based on MLE.