THE EFFICIENCY OF EMPIRICAL LIKELIHOOD WITH NUISANCE PARAMETERS

The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-pro t purposes provided that: • a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders. Please consult the full DRO policy for further details.


INTRODUCTION
Likelihood inference may have some drawbacks when estimating a parameter of interest in the presence of nuisance parameters. For example, (Neyman and Scott, 1948) considered this problem and found that the maximum likelihood estimation could be either inconsistent or inefficient with many nuisance parameters. This study deal with Empirical Likelihood (EL), which is a nonparametric analogue of maximum likelihood, in the presence of nuisance parameters combined with selection of moment conditions. We show that with the existence of nuisance parameters, the asymptotic efficiency of the empirical likelihood estimator of the parameter of interest can be increase by adding more moment conditions, in the sense of the positive semidefiniteness of the difference of information matrices. Particularly, we focus on a special case, where nuisance parameters only appear in some of the moment conditions. This case leads to an important result that the asymptotic efficiency can increase with added moment condition only if it is not orthogonal with the original moment conditions.

MOMENT CONDITION WITH NUISANCE PARAMETERS
where, g is a m×1 vector of real functions and the expectation is taken with respect to F. We consider the over-identified case where m≥p. Unlike (Qin and Lawless, 1994), we don't assume that the m functions of g are independent, since correlation between these functions plays an important role in the aspect of asymptotic efficiency, which we will discuss in the following section. Now suppose the parameter θ can be decomposed as If we are only interest in β but not in φ, then φ is a nuisance parameter in the model and we write the corresponding moment condition as Equation (2): For the true value β 0 of β. The empirical likelihood ratio statistic for this model is Equation (3): where, λ is an m×1 vector of Lagrangian multipliers, which is a continuous differentiable function of ( ) ', ' ' β φ (see, e.g., (Qin and Lawless, 1994) and is determined by Equation (4): To simplify notations, let: Assumption 1 θ 0 = (β 0 , φ 0 ) solves E[g(x; θ)] = 0 uniquely, or equivalently, both β 0 and φ 0 are strongly identified.

Remark 1
This condition combined with m≥p makes the parameter well identified. In the study of (Stock and Wright, 2000), they considered the problem of weak identification of the parameter, by assuming that the subvector β of θ is completely identified, but φ is not, in the sense that the population moment function is steep in β around β 0 but is nearly flat in α. This idea provides us a framework to analysis problems mixed with nuisance parameters, weak identification and partial identification (Phillips, 1989).
We derive the properties of the EL estimator of β 0 in the next theorem.

Remark 2
The structure of the asymptotic variance-covariance matrix 1 V β is different from those in (Stock and Wright, 2000), in which they decompose where m 1 (θ) involves both of the two parameters and m 2 (β) involves β and the true value of φ. Lazar and Mykland (1999) consider higher order properties of β through Edgeworth expansion of R(β, φ) They find that β may not achieve higher order accuracy which can be obtained by ordinary likelihood in the presence of nuisance parameters, also they show that the empirical likelihood ratio statistic does not admit Bartlett correction, unlike the case without nuisance parameters.

MORE MOMENT CONDITIONS
Now we focus on the asymptotic efficiency of β when there are more moment condition being added. Suppose based on model (1), we have the following new model by adding one more moment indicator f (.) Equation (9): For more notations we define: In this model, following the setup in the previous section, the parameter vector information given by f. Let the estimator of β based on both g and f denoted as β ɶ and the corresponding covariance matrix as 2 V β : In general, well established results have shown that at least using f will not be harmful, i.e., it will not increase the asymptotic variance of θ ɶ . And, nor will dropping f will decrease the asymptotic variance of the estimator, relative to that of the estimator based on both g and f. See, corollary 1 of (Qin and Lawless, 1994).

Remark 3
A similar and relevant situation may be worth mention, which is described in (Newey and Windmeijer, 2005;Han and Philips, 2006), for instance. They assume that the number of moment conditions is increased with the sample size. Thus in this case extra information are provided by both extra data and extra moment conditions, while in our case only by the latter one with fixed sample size n. They also allow the moment conditions are weak, while we assume both g and f are strong as indicated in assumption 1. Estimation under many weak moment conditions is also discussed by (Andrews and Stock, 2005).

Proposition 1
The asymptotic efficiency of EL estimator of β can be increased by adding more moment conditions.

Proof
Since we can always block the component of the vector of the moment function, for simplicity and without loss of generality, we assume that both g and f are of dimensional one.
For convenience let: The inverse of 2 V β , or the information matrix of β with both g and f is Equation (10) Since without f, the information on β is: ( ) Which is positive semidefinite, providing E(gg') is p.d as Assumption 2 indicates.

Example 1
Suppose we have a sequence of i.i.d observations of univariate random variable x 1 ,…,x n . Let E(x) = µ and var (x) = σ 2 Thus we have the following two moment conditions Equation (11 and 12): And now we are only interested in the estimation of µ. The empirical likelihood estimator of µ is Equation (13): Notice that without g 2 , nVar(u) equals σ 2 .
In the above example, we notice that and this feature simplifies the calculation dramatically. So we consider the following more special model, where g does not have nuisance parameter, but f has a nuisance parameter only, although it brings some information from the data Equation (14): The gradient vector of h in (14) is: The information on β is: where, I is the corresponding identity matrix. Now we have Equation (15 and 16): Where: ( ) By assumption E(gg') is positive semidefinite, so ( )

Proposition 2
Additional moment conditions which contains only nuisance parameters will provide extra information on the parameter of interest only if they are correlated to the original moment conditions.

Remark 4
Whether g and f are correlated is a testable condition. Since E[g(x,β 0 )] = E[f(x,φ 0 )] = 0, to test the correlation of g and f it is equivalent to test the following additional moment condition Equation (17): And this can be done by standard EL test procedure.

CONCLUSION
In this study we have discussed the efficiency of the EL estimator in the presence of nuisance parameters, via standard asymptotic method. We are particularly interested in whether the asymptotic efficiency of the parameter of interest can be improved by adding more moment conditions. We found that a necessary condition for augmented moment condition to be useful to improve the asymptotic efficiency is that it is correlated to the original moment condition. It is worth mentioning that here we incorporate more moment conditions with sample size being fixed, while researchers like (Newey and Windmeijer, 2005;Han and Philips, 2006) consider increasing the number of moment conditions brought by increasing sample size.
For future research, it would be worth checking the efficiency of the EL test with nuisance parameters, because it will be an extension to the results found in (Wang, 2013), where the large deviation efficiency of the EL test with weakly dependent data is established.