Empirical Likelihood Estimation Based on Simulated Moment Conditions

The empirical likelihood (EL) method has a critical problem when the objective function to be optimized cannot be computed or is not di⁄erentiable if the moment condition is highly nonlinear or discrete. We deal with this issue following the method of simulated moment (MSM) introduced by Pakes and Pollard (1989) and McFadden (1989) to get an objective function which is computable, and we use importance sampling method to smooth discrete moment conditions. We have demonstrated the convergence and asymptotic normality of the empirical likelihood estimator based on the simulated moment conditions.


Introduction
Recently the Empirical Likelihood (EL) method has been increasingly popular in statistics and econometrics as an alternative to GMM, due to its desirable higher order properties, see Owen (2000) for a comprehensive introduction and Newey and Smith (2004) for higher order asymptotics, among others. In this paper we contribute to the literature by addressing how EL deals with non standard moment conditions as where x is the observed data, 0 is the parameter to be estimated and g is a nonstandard function in the sense that g is di¢ cult to compute or can even be non-smooth. In this case both the generalised method of moments (GMM) and EL will be di¢ cult to apply because they require explicit calculation of the sample analogue of the moment condition and existence of the derivative of g (x; ) with respect to : To overcome this problem, the methodology of our paper is as follows: we apply the method of simulated moment (MSM) introduced by Pakes and Pollard (1989) (hereafter PP) and McFadden and Ruud (1994) (hereafter MR) to empirical likelihood to simulate the moment condition where it is hard to compute, so that we extend MSM to broader applications. Furthermore, our another contribution is to use importance sampling , that is, we replace the original moment condition by another one obtained via simulation with observations from a di¤erent probability distribution which is relatively easy to handle. Also, we notice that as McFadden (1989) points out, importance sampling can be used to smooth discrete moment conditions, therefore we extend our estimation method to more general case where the moment conditions can even be discrete. The next step is that we then form the EL objective function based on the simulated moment condition, and do a Taylor expansion of the …rst order derivative of the objective function to show the consistency and asymptotic normality of the solution (the estimator).

Empirical Likelihood with non Standard Moment Condition
Consider the following moment condition model: where x is the observed data, 0 2 R is the parameter to be estimated and g is a real function. Following the well established procedures, (e.g., Qin and Lawless (1994) and Newey and Smith (2004)), the EL estimator based on (1) is de…ned aŝ where and is a vector of Lagrangian multipliers. A problem in empirical likelihood estimation of by minimizing (3) is that g ( ) is sometimes intractable, i.e., not in an explicit form, so that we cannot calculate its sample analogue, nor we can get its derivative. Another situation is that sometimes g ( ) is not continuous in , but usually empirical likelihood estimation assumes that g ( ) should be continuous and di¤erentiable in the parameter of interest, so that we can demonstrate the consistency of EL estimator. (see, e.g., assumption 1 of Newey and Smith (2004)). To summarize these situations we list the following cases.
Problems arises becauseĝ (x; ) is not continuous in : To overcome these problems in GMM, Pakes and Pollard (1989) considered simulating a good estimateg ( ) instead of using g ( ) directly. Speci…cally, if we let G n ( ) be a simulation of E [g (x; )] and~ be the GMM estimator based on G n ( ), then the conditions under which~ converges to 0 are described in the following theorem.
where k k is some norm depending on :

Remarks
The intuitions of these conditions is to require the simulation G n ( ) be as close to E [g (x; )] as possible. Speci…cally, a. G n ( ) evaluated at the estimator~ cannot be much bigger than the smallest value of G n ( ) in .
b. G n ( ) evaluated at the true parameter 0 cannot be much bigger than zero. c. G n ( ) evaluated outside some neighborhood of 0 should be large.
To use the results of this theorem in EL, we consider a speci…c simulation method Importance Sampling which is introduced in the next section.

Importance sampling
Importance sampling is a simulation method which is useful to estimate an integral about a probability distribution from a di¤erent distribution. Suppose we want to evaluate the integral is a function of x and p(x) is the density of x: If it is di¢ cult to sample from p(x) ; we can choose another probability distribution Q(x) with density q(x), which is called the importance function and has the same support as p(x); and where w(x) = p(x)=q(x) is called the importance weight (also inverse likelihood ratio). Note that w (x) is always positive, E q [w(x)] = 1; and this weight function re ‡ects the important regions of the sampling space. A special case is that q(x) = p(x); when w (x) = 1.
(5) motivates an unbiased estimator for E p [g(x)] by sampling S independent values from Q(x) and calculating Note that g(x)w(x) is an unbiased estimator of E p [g(x)] by construction, with expectation taken with respect to q(x): It is interesting to check the expectation of g(x)w(x) with respect to p(x): Generally it will depend on the choice of q(x), but in some circumstances this expectation can be bounded by a function that does not depend on the choice of q(x): The following result will be useful later: Proof. The result is directly from the Hölder inequality: where k k 1 denotes the norm in L 1 space.

Large Sample Results
Now we replace E [g (x; )] in the original model (1) by its simulated version computed by (7) through importance sampling and de…nẽ As mentioned above,Ẽ p [g (x; 0 )] is unbiased, so E [g (x n ; )] = 0 and therefore can be used as a new moment condition to estimate 0 : We further de…ne: and let their counterparts from g (x; ) be de…ned analogously, and denoted without accent above, e:g:; g ( ) 1 N N n=1 g (x n ; ) : To apply the results of theorem 1, we de…ne the empirical likelihood estimator~ as the solution to the following problem:R and is a vector of Lagrangian multipliers which is a function of implicitly de…ned through 1 N N X n=1g (x n ; ) 1 + 0g (x n ; ) = 0; e.g., see Qin and Lawless (1994). For the general asymptotic properties of empirical likelihood estimator, we make the following regularity assumption.
Assumption 1 a. 0 2 int ( ) ; and is a compact subset of Assumption 2 For any > 0; sup k 0 k> kg ( )k 1 = O p (N 1 ): Furthermore, we need a smoothing condition for uniform convergence. Let the simulation residual process de…ned as Assumption 3 The process ! ( ) is stochastically equicontinuous at 0 ; i.e., for any > 0, there exists a neighborhood U of 0 , which satis…es The following theorem demonstrates the consistency of~ ; by checking similar conditions given in theorem 1.
Theorem 2 Given assumption 6-8, we have the following results: and then~ converges in probability to 0 : Proof. The …rst result is to say thatg ( ) is big outside some neighborhood of 0 ; which is from the identi…cation of 0 . To see this, note that from triangle inequality we have where where _ lies between 0 and : According to Lemma A1 and A2 of Newey and Smith (2004) we have = O p N 1=2 and 1 (1+ _ 0g (xn; )) 2 1=2: Thus from (16) and result 1 we havẽ Now from the de…nition of~ we havẽ Solvingg(~ ) out of (17) gives 8 which implies~ must be within the neighborhood of 0 of radius ; by noting that g( ) is continuous. The convergence follows since can be arbitrary small.
Theorem 3 Given assumption 1-3, Proof. Firstly we show that p n ~ 0 is stochastically bounded. Sinceg(~ ) = o p (1); hence C N ^ = Op(1) and by expanding C N ~ we have With the consistency~ Based on theorem 1, the following proof is similar to Parente and Smith (2008). Now we de…ne Let G n ( ) = @m n ( ) =@ ; G ( 0 ) = 1 N P N n=1 G n ( 0 ) ;~ n = 1 N P N n=1m n ( 0 )m n ( 0 ) 0 : Expand the …rst order condition for the saddlepoint problem of (??) around 0 and 0 = 0 : ' (20) and (21) imply Note that from Lemma 1 we have Also from i.i.d assumption and unconditional simulation, Next we show~ and _ are asymptotically equivalent. The de…nition of~ implies: Then with the similar expansion as (17) we have Sog(~ ) g( _ ) = o p (1) : Thus according to the continuity ofg we have~ = _ +o p (1) : Discussion of the asymptotic results: 1. The consistency result also holds ifẼ p [g (x; 0 )] is a biased estimator for 3. It turns out that the asymptotic variance-covariance matrix of~ does not depend on the choice of importance function q( ) but on the number of simulations S: this is the case which MR called unconditional simulation. As S goes to in…nity the disturbance of simulation vanishes, and thus~ is asymptotically equivalent to general EL estimators. 4. These asymptotic results is similar to that of McFadden and Ruud (1994) obtained for GMM estimator. The covariance matrix of their estimator is larger than general GMM estimator due to simulations, which is slightly di¤erent from the covariance matrix of our EL estimator. However, both of our proofs aim to show that, the simulated moment indicator evaluated at the true parameter and at the estimator satis…es similar conditions indicated in the proof of theorem 3.1 of Pakes and Pollard (1989).

Conclusion
We have presented EL estimation with moment condition which is intractable, and we also mentioned that simulation by importance sampling can be used to smooth moment condition with discreteness in parameter. This is a di¤erent way from Parente and Smith (2008) approach. Rather than simulating the moment indicator, they put di¤erent assumption on it to ensure the EL estimator to have standard …rst order asymptotic properties.
It is important to note that these asymptotic results of our estimator rely heavily on i.i.d assumptions on observations and simulations, and for time series model our EL estimator may fail since the general conditions for uniform convergence and the law of large numbers will not be satis…ed. So if we want to use EL by simulating moment conditions with dependent data through importance sampling, more assumptions on stochastic convergence (e.g., see Pollard (1984) and chapter 4 of Billingsley (1999)) should be added, and the choice of importance function should also be carefully considered, to make the simulated moments satisfy certain conditions. These are the directions of our further research.