ESTIMATED AND ANALYSIS OF THE RELATIONSHIP BETWEEN THE ENDOGENOUS AND EXOGENOUS VARIABLES USING FUZZY SEMI-PARAMETRIC SAMPLE SELECTION MODEL

An important progress within the last decade in the development of the selectivity model approach to overcome the inconsistent results if the distributi onal assumptions of the errors terms are made this problem is through the use of semi-parametric method. However, th uncertainties and ambiguities exist in the mod els, particularly the relationship between the endogenou s and exogenous variables. A new framework of the relationship between the endogenous and exogenous v ariables of semi-parametric sample selection model using the concept of fuzzy modelling is introduced. Through this approach, a flexible fuzzy concept hy brid with the semi-parametric sample selection models kn own as Fuzzy Semi-Parametric Sample Selection Model (FSPSSM). The elements of vagueness and uncertainty in the models are represented in the model construction, as a way of increasing the available information to produce a more accurate model. This led to the development of the convergence theorem presente d in he form of triangular fuzzy numbers to be use d in the model. Besides that, proofs of the theorems are presented. An algorithm using the concept of fuzzy modelling is developed. The effectiveness of the es timators for this model is investigated. Monte Carl o simulation revealed that consistency depends on ban dwidth parameter. When bandwidth parameters, c are increased from 0.1, 0.5, 0.75 and 1 as the numbers of N increased (from 100 to 200 and increased to 50 0), the values of mean approaches (closed to) the real para meter. Through the bandwidth parameter also reveals th t the estimated parameter is efficient, i.e., the S.D , MSE and RMSE values become smaller as N increased . In particular, the estimated parameter becomes consist e t and efficient as the bandwidth parameters appro ches to infinity, c→∞ as the number of observations, n tend to infinity, →∞.


INTRODUCTION
The sample selection model or the selectivity model introduced by Heckman (1979) is one of the most successful regression models if applied together with other models. This model is a combination of the probit and regression models. The earlier studies on this model focused on the parametric approach. However, the standard approach of estimating sample selection model shows inconsistent results if the distributional assumptions of the errors terms are made. Hence, an important progress within the last decade in the development of an alternative approach to overcome this problem is through the use of semi-parametric method (Andrews (1991;Cosslett, 1990; Science Publications AJAS Gerfin, 1996;Ichimura and Lee, 1991;Khan and Powell, 2001;Klein and Spady, 1993;Lee and Vella, 2006;Martins, 2001;Powell, 1987;Powell et al., 1989).
Although semi-parametric method of selectivity model is established, there still exist a basic problem of intrinsic features, such as uncertainty and ambiguity particularly in the relationship between the endogenous and exogenous variables. Therefore, it will disrupt the ability and effectiveness of the model proceeded to give the estimated value that can explain the actual situation of a phenomenon. These are questions and problems that have yet to be explored and the main pillar of this study. A new framework of the relationship between the endogenous and exogenous variables of semi-parametric sample selection model using the concept of fuzzy modelling by Zadeh (1965) is introduced. Through this approach, a flexible fuzzy concept hybrid with the semi-parametric sample selection models known as Fuzzy Semi-Parametric Sample Selection Model (FSPSSM). Hence, an alternative way to deal with this uncertainty and ambiguity is to use fuzzy concepts introduced by Zadeh (1965).
The purpose of this chapter is twofold; firstly, to provide a better understanding of the magnitude of consistency as well as efficiency, when the new modeling of FSPSSM is implemented under normality assumption. It is then extended to verify the inconsistency of the model when it does not follow the assumption of the normal distribution. Secondly, is to provide the magnitude of the consistency under FSPSSM. For this purpose, the bandwidth parameter of Powell (1987) model is used. To achieve these aims, Monte Carlo simulations using R language programming by Safiih (2013) and as well as the estimator introduced by (Powell et al., 1989;Powell, 1987) which are hybrid with fuzzy concept is developed.

The Semi-parametric Sample Selection Model (SPSSM)
The semi-parametric sample selection model is a hybrid between the two sides of the semi-parametric approach, i.e., it combines some advantages of both fully parametric and the completely nonparametric approaches. The first model, i.e., participation equation is estimated by the parametric method, while the outcome equation is estimated by the nonparametric method. For instance, (Newey et al., 1990;Martins, 2001) used a two-step semiparametric approach of model (1). The Semi-Parametric Sample Selection Model (SPSSM) can be written as: generalises the Heckman's two-step procedure, i.e., in the first step, the participation equation is estimated semiparametrically using the estimator proposed by Klein and Spady (1993). The results from this first step are used to construct a nonparametric correction term for selectivity of wage equation in the second step. The difference between parametric and semi-parametric approaches comes in the form of weaker assumption of the error term. The two-step estimation procedure refers to the estimation of the participant and the outcome equations as mention in Lola et al. (2009). Consider the binary selection model and proceeds by specifying the parametric part of the model. Order the N observations such that the first 1,...,n observations represent participants with d i = 1 and y i observed. The remaining observations were the nonparticipants with d i = 0 and y i unobserved. In this study, the semi-parametric method in Equation 1 is considered. As with the Safiih (2013) paper of FPSSM, this method involves two steps. In the first step, the parameter, β in the participation equation is estimated using Density Weighted Average Derivative Estimator (DWADE) and in the second step, Powell (1987) estimator in the outcome equation is used to estimate the parameter, γ. The DWADE which was proposed by Powell et al. (1989) is used to estimate parameter βin the first step of Equation 1. This estimator is based onsample analogues of the product moment representation of the average derivations and is constructed using nonparametrickernel estimators of the density of the regressors. However, a practical interest of weighted average derivatives is that they are proportional to coefficients vector βin the index function. Powell estimatorproposed by Powell (1987) is used to estimate parameter γin the second step of Equation 1. Powell (1987) considered asemi-parametric selection model that combines the two-equation structure with the following weak distribution alas sumption about the joint distribution of the error terms with the form: Science Publications

AJAS
It is assumed that the joint density of ε i , u i (conditional on w i ) is smooth but with unknown function f(.). Hence it depends on w i only through the linear model, i.e., ' i w γ .
Based on these assumptions, the regression function for the observed outcome z i takes the following form: where, λ(.) is an known smooth function. Ideally, given two observations i and j with w i ≠ w j and the condition of ' ' i j w w γ = γ the unknown function λ(.) canbe differentiated by subtracting the regression functions for i and j: This is the basic idea underlying the estimator of γ as proposed by Powell (1987): These weights of ˆi j ϖ are calculated, γ can be estimated by a weighted least-squares estimator, where where, g(.) is unknown but a smooth function. Estimators for β in this model have been discussed in section 2.5, as the first step of the semi-parametric procedure. Given γ , the second step of the semi-parametric procedure consists of estimating γ using Equation 3. Powell (1987) proved that the ˆp owell γ estimator in Equation 3 is n -consistent and asymptotically normal under an appropriate chosen Kernel (or bandwidth c). This result provided a n -consistent and asymptotically normal distribution as Equation 5: where, d  → denotes convergence in distribution. The Powell procedure takes the data as input from the outcome equation (x and y, where may not contain a vector of ones). The first-step, index 'ŝ p i x β is estimated which involved the vector id and bandwidth vector, c. Both id and c are DWADE which are multiply by an i.i.d. random sample and the k threshold parameter, respectively. The first element of c is used to estimate the intercept coefficient. The bandwidth c from the second element is used for estimating the slope coefficients.

Fuzzy Modelling
In this study, we used the fuzzy set definition that is related to the existing fuzzy set theory introduced by Zadeh (1965). The Definition 1 of fuzzy numbers is followed from Yen et al. (1999). The definition is as follows:

Definition 1
The fuzzy function is defined by Y % is the codomain of x associated withthe fuzzy set A % • Some properties of fuzzy set where A⊂F (ℜ) is called a fuzzy number if: The membership function for Triangular Fuzzy Number

AJAS
where, 1≤m≤u, x is a value of real number I and u, the lower and upper bound of the support of A, respectively. Then the TFN is denoted by (l, m,n). The support of is the set elements {x∈ℜ|l<x<u}. A nonfuzzy number by convention occurs when l = m = u.

Definition 2
Let X be a space of point and x∈X, ∀x∈D⊂X s.t. ∃µ D : is called a fuzzy data. The process for getting fuzzy data is illustrated in Fig.1. In this figure, x is original data (Fig. 1a) which involves uncertainty. Hence, it is called crisp uncertainty data ( Fig. 1b) which is assigned the value of 1 or 0. In order to get a fuzzy data, the process of fuzzification ( Fig. 1c) with the membership function between (0,1] and defuzzification ( Fig. 1d) will be implemented.
The structure of fuzzy data, specifically the process of fuzzification is depicted in Fig. 2 within the α i -cut. The lower and upper bound of each observation follows the triangular membership and become lower and upper bound respectively. Where:

Theorem 1
Let the fuzzy data be defined by TFN, then the coefficient values of the exogenous variables of the participation and wage equations for fuzzy data converge to the coefficient values of exogenous variables of the participation and wage equations for crisp data, respectively, whenever the value of α-cut tend to 1.

Fig. 1.
Step by step process of getting fuzzy data (a) Original data (b) Crisp uncertainty data (c) Fuzzification (d) Defuzzification

AJAS Proof
In order to get the crisp value, the centroid method is used. Then, the fuzzy number for all observations of ϖ i is given as: . Applying the α-cut into the triangular membership function, the fuzzy number that is obtained depends on the given value of the α-cut over the range 0 and 1 and is as follows: When α approaches 1, then: Further, we obtained: Equation 7 stated that when α approaches 1, then ( ) ic W α approaches crisp, ϖ i . In general, any observation of the real fuzzy data is crisp for all observations such that x i and z i , → respectively, as αtends to 1. This implies that the fuzzy data values of the participation and structural equations converge to the values of the participation and structural equations for crisp data, respectively whenever the value of α-cuts tend to 1.

Development of Fuzzy Semi-Parametric Sample Selection Model
In order to formulate a fuzzy SPSSM, the SPSSM in Equation 1 is reconsidered. Towards the development of FSPSSM, the same procedure in Lola et al. (2009) is used which involved 3 stage i.e., (1) fuzzification, (2) fuzzy environment and (3) defuzzification. In the first stage, the elements of real-valued input variables or crisp uncertainty values are converted into fuzzy data using a particular value of membership function. A triangular fuzzy number with α-cut method is used for all observations. In this study, the same α-cuts method as in Equation 7 is    For the second step, the α-cuts method is proceeded for all the exogenous variables and error terms. While for the third stage, the fuzzy values are converted to output of the crisp value as in the following formula Equation 9: where, K l sp α and K u sp α represents lower and upper bounds, respectively. Again, 5 types of defuzzified can be generated in this stage where K A takes values of K = 1, 2, 3, 4 and 5. The values of 1, 2, 3, 4 and 5 representing

Consistency and Efficiency of FSPSSM
To obtain a consistent estimator of FSPSSM in Equation 1, the error terms is assumed to follow a normal distribution. The hybrid of the model proposed by Nawata (1994) with fuzzy concept is considered. Then, the Monte Carlo simulation technique (Kabadayi, 2004;Rana et al., 2008;Witchakul el al., 2008) is used to illustrate the developed model. Adversely, the estimators are inconsistent if the error terms does not satisfy normal distribution (Chamberlain, 1986;Robinson, 1988;Powell et al., 1989;Cosslett, 1990;Ichimura and Lee, 1991;Newey et al., 1990;Vella, 1992;Ichimura, 1993;Schafgans, 1996;Markus, 1998). For the development of SPSSM, one of the elements used to measure the consistency or efficiency of the parameter is through the usage of bandwidth parameter, c(for instance, Chamberlain, 1986;Powell et al., 1989;Andrews, 1991;Cosslett, 1990;Ahn and Powell, 1993;Klein and Spady, 1993;Schafgans, 1996;Das et al., 2003;Bellemare et al., 2002). According to Hardle (1990), the bandwidth parameter is a scalar argument to the kernel function that determines what range of the nearby data points will be heavily weighted in making an estimate. The choice of bandwidth represents a trade-off between bias (which is intrinsic to a kernel estimator and which increases with bandwidth) and variance of the estimates from the data (which decreases with bandwidth). An estimator is efficient if the RMSE values become smaller as the bandwidth parameter, c values increases as the number of N increased.

The Monte Carlo Simulation of Fuzzy Semi-Parametric Sample Selection Model
As mentioned earlier to achieve the second aim of this study i.e., consistency under FSPSSM, the Monte Carlo simulation developed by Nawata (1994) with α-cuts of 0.2, 0.4, 0.6 and 0.8 are also considered. In this section, the effectiveness of the proposed model is focused on the usage of bandwidth parameter, c. Therefore, the form of FSPSSM with DWADE and Powell estimators are hybrid with Nawata (1994) can be rewritten according to Equation 1 as follows:  (Newey et al., 1990). For all cases, the number ofreplications is 1,000. The true parameters value of γ 1 is 1.

The Monte Carlo Simulation Result: The Fuzzy Semi-Parametric Sample Selection Model
The results of Monte Carlo simulation of FSPSSM with N = 100, 200 and 500 are presented in Table 1 to 3, respectively. The first and second column are the α-cuts i.e., 0.2, 0.4, 0.6, 0.8 and bandwidth parameters, c i.e., 0.1, 0.5, 0.75, 1, respectively. The rest of the columns represent the mean, the Standard Deviation (S.D), the Mean Square Error (MSE) and the Root Mean Square Error (RMSE), respectively. To study the consistency under FSPSSM, we only reported the Powell estimator. This is due to the DWADE estimator is estimated and used inside the Powell estimator. The consistency results of FSPSSM are obtained in A part of the consistency using bandwidth parameters, it is used also to performed an efficient of the estimated parameters. This is reported in Table 1 and 3 by the S.D, MSE and RMSE values. For instance, Table 1 shows that when N is 100, α-cuts is 0. Similar results have been obtained (for N = 200 and 500) as in Table 2

DISCUSSION
Since Heckman (1979) introduced the sample selection model, this model has received considerable attention (parametric, semi-parametric or nonparametric) and has been used in many applications. However, the researchers do not put an effort to investigate this model in terms of uncertainty regardless of its existence in the model. Thus, in this study, we introduced the fuzzy concepts hybrid with the semiparametric sample selection model. The fuzzy concept is an alternative framework to solve the problem of uncertainties existing in this model, particularly the relationship between the endogenous and exogenous variables. Therefore, it will disrupt the ability and effectiveness of the model proceeded to give the estimated value that can explain the actual situation of a phenomenon. These are questions and problems that have yet to be explored and the main pillar of this study. Therefore, this model was the first developed using fuzzy concept known as the Fuzzy Semi-Parametric Sample Selection Model (FSPSSM).

CONCLUSION
In this study, we studied the consistency for FSPSSM under normality assumption. Subsequent of this assumption, the effect of the correlation between fuzzy variables ( wand x % % ) and the effect of the correlation between error terms ( i i and u ε% % ) are investigated. As a continuation from that, consistency in FSPSSM using the bandwidth parameter as introduced by Powell (1987) is also studied. A Monte Carlo simulation is used to examine the consistency for FSPSSM under normality assumption. The Monte Carlo simulation results reveal that consistency depends on bandwidth parameter. When bandwidth parameters, c are increased from 0.1, 0.5, 0.75 and 1 as the numbers of N increased (from 100 to 200 and increased to 500), the values of mean approaches (closed to) the real parameter. According to Schafgans (1996), this indicated that the FSPSSM is consistent. Through the bandwidth parameter also reveals that the estimated parameter is efficient, i.e., the S.D, MSE and RMSE values become smaller as N increased. In particular, the estimated parameter becomes consistent and efficient as the bandwidth parameters approaches to infinity, c→∞ as the number of observations, n tend to infinity, n→∞. In this study, we are focusing only for two area which are fuzzy concept particullarly on fuzzy number of semiparametric Sample Selection Model coins as fuzzy semi-parametric Sample Selection Model (FSPSSM) and to see the effectiveness of the proposed model, the simulation using monte carlo is used.
This paper developed of this proposed modeling approach, future research work could be emphasized in several directions. Apparently, the fuzzy concepts defined in this study consider the TFN and α-cut method. Therefore, future study could consider other fuzzy numbers which are more advanced such as Sshaped, bell-shaped. Since the relationship between explanatory variables exists in the models, the concept of linear programming-based method introduced by Tanaka et al. (1982) and Amri and Tularam (2012) could be explored. By doing so, perhaps a deeper understanding of the underlying structure of the models could be obtained. Thus, some other mathematical tools such as optimization theory could be explored.
The most significant idea of this research was to bring the concept of fuzzy into the selectivity model. In general, this concept is considered as a platform to discover a new dimension using these models. Further research could consider development of fuzzy perspective on a new paradigm of selection model, such as nonparametric and semi-nonparametric methods, the properties and theoretical parts of selection model, handling a weaker assumption and investigating "a curse of dimensionality" using fuzzy concept.
In the development of FSPSSM, fuzzy logic using rules based method can also be considered. This concept will lead to produce an output based on linguistics variables and linguistics modifier. Hence, the proposed model would be useful in order to compute the uncertainties in the models. Thus, it would be interesting to find whether it is possible to determine the percentages to which any specific uncertain parameters of the models contribute to the overall uncertainty of the models.
In this study, we have developed Monte Carlo simulations using R language programming. These simulations could be improved. Babuska and Verbruggen (1996;Chandramohan and Kamalakkannan, 2014;Hussein and Nordin, 2014;Kareem, et al., 2014;Kahtan et al., 2014;Sridharan and Chitra, 2014) mentioned that modeling of complex systems will always remain an interactive approach. Thus, future study could consider the usage of other software packages or programming languages and incorporate graphic interface. In this way, information such as parameter estimate could be easily utilized and would be beneficial to the decision makers as well as others interested parties. These methods could be useful in data mining, e.g., in credit default analysis, healthcare analysis, security analysis and agriculture analysis.

ACKNOWLEDGMENT
The author wishesto thank theMinistry of Higher Education, Malaysia for sponsoring studies and University Malaysia Terengganu (UMT) for the support and encouragement of study at the PhD level