THE RATE FUNCTION OF S-MIXING PROCESS AND ITS APPLICATION IN EVALUATING EMPIRICAL LIKELIHOOD TESTS

In this study we show that the rate function of S-Mixing stationary process with part of the hypermixing condition is equivalent to the Kullback-Leibler distance. This finding can be used to extend the result of the optimality of empirical likelihood test in i.i.d setting to weakly depended case, so that the Asymptotic Relative Efficiency (ARE) of the empirical likelihood test with strong mixing data can be established.


INTRODUCTION
Large deviation results on random variables can be used to derive the bounds of some loss functions, therefore the efficiency of statistical inference can be assessed by large deviation probabilities, particularly in the case that both the loss function and the rate function of the large deviations are the Kullback-Leibler (K-L) distance between two probability measure, say, m 1 and m 2 : ( )  Bahadur (1967) and Dembo and Zeitouni (1998) (hereafter DZ), where the optimality of statistical inference with i.i.d data is extensively discussed. Although in general the rate function for non i.i.d. data is not equal to the K-L distance, the purpose of this note is to show that combined with part of the hypermixing condition, the Smixing stationary process has a rate function equal to the K-L distance, hence the optimality results on empirical likelihood test with i.i.d. data (Kitamura, 2001) can be extended to the case of weakly dependent data.
Let ∑ be a compact topological space and {X t : t ∈ Z} be a stationary stochastic process taking values in ∑. Bryc and Dembo (1996) (hereafter BD), {X t : t ∈ Z} is said to be S-mixing if for any finite constant C < ∞, there exists a non-deceasing sequence l(n) ∈ N with Equation 1: n 1 l(n) n (n 1)
BD points out that S-mixing is a fairly weak condition since α-mixing condition suffices for Smixing, so it is suitable for a quite general class of stochastic processes (see also DZ). Specifically, BD shows that α-mixing implies S-mixing and also proves that S-mixing will hold if it satisfies the following two conditions (H-1) and (H-2), which are sometimes called hypermixing conditions.

Assumption 1 (H-1)
There exist l, α < ∞ such that, for all k, r < ∞ and any l-separated functions (For any given intergers r ≥ k ≥ 2, l
.,X E f X ,..,X g X ,.., X (l) E f (X ,..,X ) The following theorem of large deviations can be considered as an analogue of Sanov theorem (Sanov, 1965) of S-mixing process, with which we are able to evaluate large deviation probabilities of weakly dependent data. Let Q be the underlying probability measure of the whole process and let Q n denote the nth margin of Q and particularly, Q 1 ∈ M 1 (∑) is the probability measure of a single realization, where M 1 (∑) denotes the space of Borel probability measures on ∑. Also define the empirical measure of a sample 1 2 n x , x ,.., x to be n n xi x i 1

Theorem 1 (Bryc and Dembo, 1996)
If a stationary process {X t : t ∈ ℤ } is S-mixing, the sequence of empirical measures satis.es the LDP with respect to the τ-topology in M 1 (∑) and this LDP is governed by the good rate function Equation 5: i.e., for every set Γ ⊆ M 1 (∑) Equation 6 and 7: where, P n ∈ M 1 (∑ n ) is the distribution of µ n and Equation 8: and the limit exists for every f ∈ B (∑, ℝ ), where B (∑, ℝ ) is the space of all bounded, real-valued, Borel measurable functions on ∑.
On the rate function of the LDP, BD mentioned roughly in their paper that I(v) in general will be less than specific K-L distance, but they didn't provide any proof. However, we find that I(v) will be equal to the K-L distance if the S-mixing condition are combined with part of hypermixing condition. This main result is presented in the next section.

THE RATE FUNCTION OF THE S-MIXING PROCESS
First, we introduce the following lemma from DZ.

Lemma 1
Given assumption (H-1), for γ > 0, we have Equation 9: see DZ for proof. This lemma ensures that Λ(f) is bounded and so is it's legendre transformation I(v) in (5).
Our following theorem shows the equivalence of the Kullback-Leibler distance H (. |Q 1 ) and the rate function of the LDP of S-mixing data.

Theorem 2
If assumption (H-1) and 3 are satisfied, the rate function I(v) in theorem 1 satisfies:

Proof
Firstly we show that I(v) ≥ H(v || Q 1 ): From lemma 1 we have Equation 10: The last inequality implies that v is absolute continuous with respect to Q 1 . To see this, let Γ∈ A satisfying Q 1 (Γ) = 0. Because the inequality holds for any f ∈ B (X, ℝ ), we can take f = ξI Γ where ξ > 0 and I is the indicator function. Note that In the next section we use Theorem 2 to establish the asymptotic optimality of the empirical likelihood (EL) test in a general Neyman-Pearson sense, showing the asymptotic type II error probability can achieve the lower bounds indicated by the rate function of the large deviation results. Similar results with i.i.d data are shown by Zeitouni and Gutman (1991) (hereafter ZG) and Kitamura (2001), but our analysis will be in a context of weakly dependent data.

The EL Test Statistic
Let n i i 1 {x } = be a realization of a stationary α-mixing (and hence ergodic and S-mixing) process {X t : t∈ ℤ } taking values on ∑. We are interested in applying the EL to test the following moment condition Equation 11: where the moment indicator g: is continuous for all d-dimensional x i ∈∑ and Q i is the unknown distribution of x i , i,e., Q i is the marginal of Q at x i . Also θ 0 ∈ Θ ∈ p ℝ is the true parameter vector. We consider the over-identifying case where m ≥ p.
If we assign each observation x i with a probability p i , the EL method solves the following problem:  Qin and Lawless (1994). The test statistic is given by Kitamura (1997) using block technique Equation 12: where, λ is a vector of Lagrangian multipliers and Equation 13: where, T is the number of blocks, M > 1 denotes the block length, L is the separation between block starting points. Now define Equation 14: Let Q = U θ∈Θ Q(θ) thus Q is the set of probability measures which satisfy the moment condition over the parameter space. Hence the problem of testing (11) where, µ n is the empirical measure of the observed data. Intuitively, empirical likelihood test is to investigate whether the empirical measure µ n which is constructed to be as close to the true probability measure as possible by the EL method, is too far away from any of the measures in Q or not. Therefore, considering the K-L distance for some threshold constant c > 0. This is to say, under the null hypothesis, the empirical measure µ n ∈ Q and therefore, if the distance between µ n and any of the probability measures in Q is too large, then we shall reject the null hypothesis. It also tells that the test depends on the data only through µ n (see DZ). Thus empirical likelihood test can be considered as a sequence of partitions Λ(n) = (Λ 1 (n), Λ 2 (n)) of M 1 (∑) where n = 1, 2…. and Equation 18: In the following we abbreviate (Λ 1 (n), Λ 2 (n)) as (Λ 1 , Λ 2 ) for economy of notation, but its dependence on the sample size n should not be ignored. Since in general framework, pointwise bounds on error probabilities are not available (Kitamura, 2001), we consider the δsmoothing of the set Λ 2 : where, B (µ, δ) denotes an open ball of radius δ around µ and the balls are taken in the Levy metric: which is compatible with the weak, strong and uniform convergence of discrete probability measures (e.g., see ZG).

Optimality Argument
To directly apply large deviation property of µ n in Theorem 1 and Theorem 2 to establish the optimality of the EL test, firstly we need some tightness and continuous condition.

Assumption 4
sup θ∈Θ ||g (x,θ)|| is bounded almost surely and thus it is a random variable under all Q 1 ∈ Q. The functional is uniformly continu-ous in µ ∈ M 1 (∑) in the τ-topology.
To see the other direction, notice that assumption 4-b implies that 1 1 Q Q { : inf H( || Q ) c} ∈ µ ∈ µ µ = is a limit point of Λ. Hence the lemma follows.
In the next theorem we show that EL test is the uniformly most powerful test among all the tests with the same size. This uniform optimality is sometimes called universal property in information theory, e.g., see ZG and DZ.