LAG SELECTION OF THE AUGMENTED KAPETANIOS-SHIN-SNELL NONLINEAR UNIT ROOT TEST

We provide simulation evidence that shed light on s everal size and power issues in relation to lag sel ection of the augmented (nonlinear) KSS test. Two lag sele ction approaches are considered-the Modified AIC (MAIC) approach and a sequential General to Specifi c (GS) testing approach Either one of these approac hes can be used to select the optimal lag based on eith er the augmented linear Dickey Fuller test or the augmented nonlinear KSS test, resulting in four pos sible selection methods, namely, MAIC, GS, NMAIC and NGS. The evidence suggests that the asymptotic critical values of the KSS test tends to result in oversizing if the (N) GS method is used and under-sizin g if the (N) MAIC method is utilised. Thus, we recommend that the critical values should be genera ted from finite samples. We also find evidence that e (N) MAIC method has less size distortion than the ( N) GS method, suggesting that the MAIC-based KSS test is preferred. Interestingly, the MAIC-based KS S test with lag selection based on the linear ADF regression is generally more powerful than the test with lag selection based on the nonlinear version.


INTRODUCTION
There has been growing concern that the Augmented Dickey-Fuller (ADF) test, which is derived under a linear setting, may not possess good power when applied to non-linear but stationary time series that are appropriate to characterize some economic and/or financial time series (Michael et al., 1997;Taylor, 2001;and Sollis, 2009). To respond to this concern, a range of unit-root tests have been developed under a variety of nonlinear frameworks (Enders and Granger (1998); Kapetanios et al. (2003) (hereafter called KSS), Bec et al. (2004); Sollis (2009) and Kilic (2011)). Among these, Kapetanios et al. (2003) is probably the most widely recognized and applied (According to Hanck (2012) and Kapetanios et al. (2003) "as evidenced by e.g., a Scopus citation count that is close to that of the most heavily cited Journal of Econometrics paper of the past five years". In addition, the KSS paper has been cited more than 200 times according to the ISI Web of Science by the end of March 2012). Kapetanios et al. (2003) propose a unit-root test using an auxiliary regression model that approximates the Exponential Smooth Transition Autoregressive (ESTAR) process by Taylor (2001) series. The nonlinear KSS test is shown, in general, to be more powerful than the linear DF test under the alternative of a globally stationary ESTAR process.
In the case of the linear DF test, serial correlation in innovations is, as suggested by Said and Dickey (1984), approximated by an augmented autoregression with a truncated lag k. An important issue is the choice of the truncated lag (k) which has vital size and power Science Publications JMSS implications. Choosing an unnecessarily large k may reduce the power of the test while if k is inaptly set too small, considerable size distortion arises. Schwert (1989) found that in order to reduce size distortion, which is often due to a large negative Moving Average (MA) root in the innovations of the examined series, one has to choose a large autoregression lag. However, a large k can result in a nontrivial loss of power (DeJong et al., 1992). Using such popular Information Criteria (IC) such as the Akaike Information Criterion (AIC) and the Schwartz or Bayesian Information Criterion (BIC) may not work out satisfactory size and power properties (Ng and Perron, 1995;2001) as these criteria are not appropriate for modelling highly persistent (integrated) processes. Ng and Perron (1995) suggested an improved datadriven lag selection procedure using a General-to-Specific (GS) approach based on sequential testing for the significance of coefficients on the highest auto regression lag. The GS approach yields a test with a better control in size but it tends to be overparameterized in some occasions, leading to unwanted power loss. Ng and Perron (2001), on the other hand, developed a class of modified information criteria with an additional data-driven penalty factor that is apposite for integrated time series. In particular, Ng and Perron (2001) demonstrated via simulations that the Modified AIC (MAIC) is superior to the conventional AIC and BIC and the modified BIC, in controlling size when MA errors are present. Besides, unlike the GS procedure, the MAIC is shown to exhibit better power properties as it is less inclined to over-parameterization.
Like the DF-type test, in practice, the KSS test has to handle serial correlation in innovations. Kapetanios et al. (2003) assumes the serial correlation to be in a linear fashion and suggested an augmented test that is similar to the augmented DF test. Kapetanios et al. (2003) alleged that "standard model selection criteria or significance testing procedure (can) be used for this purpose because under the null of a linear model, the properties of these criteria are well understood." In applied works, different lag selection methods are used. They include the AIC and BIC (Pesaran et al., 2009), the BIC (Ghoshray, 2010), the AIC (Cuestas and Garratt, 2011), the Modified AIC (MAIC) (Yau and Nieh, 2009) and a sequential-testing GS procedure proposed by (Ng and Perron, 1995;Kapetanios et al., 2003;Chortareas and Kapetanios, 2004;Bahmani-Oskooee et al., 2007;2008;Baharumshah et al., 2009) (It appears that empirical works applying the KSS test tend to favour the MAIC over other lag selection criteria).
This study provides simulation evidence to shed light on several unanswered issues regarding the lag selection in the augmented KSS test. First, while it is sensible to expect that under the common null (unit root) hypothesis the statistical properties of the lag selection methods should be much alike across linear and nonlinear unitroot testing schemes (Kapetanios et al., 2003), it is less clear if the statistical properties would remain similar under the alternatives (as the ADF test assumes a linear alternative but the KSS test assumes a nonlinear one). Second, for practitioners, it is imperative to know if the MAIC, as implemented in the ADF test, is also the preferred lag selection method when the KSS test is applied. Specifically, would the MAIC outperform the seemingly more popular GS procedure for the KSS test? Third, though it is natural to construct the lag selection criteria based on the non-linear auxiliary KSS regression when the KSS is applied (as the majority of practitioners do), there is no priori ground ruling out other approaches that may produce better results. For example, it is possible that a "hybrid" approach-namely, establishing the optimal lag first within the linear ADF scheme and then applying the selected lag to the KSS test, may achieve better size and power properties than lags associated with the nonlinear KSS regression (A hybrid approach is considered in Perron and Qu (2007), but in a different context; To the best of our knowledge, Gustavsson and Osterholm (2007) is the only paper that takes this approach)? Fourth, there is a concern in the literature that the critical values from the asymptotic DF distribution might be distant from the critical values based on small samples and the issue can be exacerbated when the unit root tests are lagaugmented. In particular, using the asymptotic critical values may distort size and power (Cheung and Lai, 1995;Cook and Manning, 2002;Wu, 2010). The same issue arises in the case of the KSS test as well where there is a need to get lag-based finite-sample critical values; however, these critical values are not available in the literature. This study therefore sets out to examine and resolve these four issues by Monte Carlo simulation.
The study proceeds as follows. Materials and Methods gives a brief review on the augmented lag selection in unit root tests. Results reports the Monte Carlo results comparing the MAIC and the GS procedure in the KSS test. Discussion summarizes the findings and gives an application example, followed by conclusion.

Unit Root Tests and Augmentation Lag Selection
Let y t , t = 1, 2, …, T, be an observed time series. Our aim is to distinguish if y t is a unit root or a stationary process. The ADF test due to Dickey and Fuller (1979) and Said and Dickey (1984) is a t-test for the null hypothesis β = 0 (unit root) against the alternative β<0 (stationary) in the autoregression Equation 1: instead of the 2-step approach considered in this study. The two approaches share the same asymptotic results, though there are minor differences in finite samples. The 2-step approach is considered because it allows the trend function to be handled by both the ADF and the KSS test in the same way; in this study, we focus only on the unit root tests that are based on the OLS detrending. It is well-known that the power of the DF test can be much improved when the GLS-detrending is used, if the initial value is set zero (Elliott et al., 1996). However, with large initial deviations, the GLS-based test suffers dramatic power loss and is dominated by the OLS-based test. Harvey et al. (2009;2012) propose a "Union of Rejection (UR)" testing strategy that is able to produce a power that traces the higher of the DF tests based on different way of detrending. This is certainly a new avenue of for the nonlinear unit root tests. Su and Nguyen (2012) suggest a modified KSS test based on the UR strategy but they assume zero lag. A research combining the issues with different methods of detrending and lag augmentation for the KSS research is currently pursued by the authors).
In practice, one may choose the lag k based on standard information criteria such as the AIC and BIC as earlier studies have done, but neither achieves robust and satisfactory results (Ng and Perron, 1995). Ng and Perron (1995) instead suggested using a data-driven procedure, where the highest lag is sequentially tested. Specifically, the most general model with a maximum lag, k max , is chosen and the coefficient of the highest lag (φk max ) is tested. If φk max is significantly different from zero, the optimal lag is set as k max ; otherwise, a model with k max -1 is considered and the significance of the coefficient of its highest lag φk max-1 is tested. The procedure continues until a significant highest lag is found (and hence the optimal lag is established). Since this procedure searches for the optimal lag with the most general setting (with k max lags) and winds down to a more specific one (with less lags), it is often referred to as a General-to-Specific (GS) procedure. The sequentialtesting GS procedure yields a unit root test with improved size (comparing with the tests based on the AIC and BIC) but it tends to over-parameterize, thereby resulting in a loss of power. Ng and Perron (2001), on the other hand, developed an information criterion that is adequate for highly persistent (integrated) series called Modified AIC (MAIC) (A modified form of the BIC (MBIC) is considered in Ng and Perron (2001) as well. However, since the MAIC outperforms the MBIC as far as size is concerned, the MBIC will not be considered in this study). The approach estimates the lag as follows Equation 2: where, the OLS estimate of β (For the purpose of meaningful comparison, following Ng and Perron (2001), MAIC(k) is computed using the same number (T-k max ) of residuals over different k. See also the discussion in Ng and Perron (2005) but in a different context). Note that the MAIC is the same as the AIC except that it includes an additional penalty term τ T (k) which can better capture the relevant cost among different lag selections in finite sample. Ng and Perron (2001) (2007) for a similar discussion on this issue). Besides, since the MAIC is less inclined to over-parameterize, it is expected to produce more powerful testing results than when the GS procedure is implemented (Ng and Perron (2001) did not include the GS procedure in their simulations. Besides, results from Ng and Perron (1995;2001) are not directly comparable as different detrending methods are used (OLS in Ng and Perron (1995) versus GLS in Ng and Perron (2001)). However, our simulation results show, in general, that the power of the MAIC-based tests is superior to those based on the GS approach).
The nonlinear unit root test of Kapetanios et al. (2003) is motivated by the observation that many time series (such as real exchange rates) exhibit local nonstationarity (unit root) but are stationary globally (Michael et al., 1997)) and nonlinear models, such as the Exponential Smooth Transition Auto Regression (ESTAR) model, could produce a better fit of these series than the linear models. Taylor (2001) shows with Monte Carlo evidence that the linear DF test does not have good power against a stationary ESTAR process. Kapetanios et al. (2003) consider the following ESTAR model Equation 3: where, u t is a stationary innovation. When ρ = θ = 0, y t is a unit root process ∆y t = u t . On the other hand, with θ>0, y t follows a nonlinear but globally stationary process, provided that -2<ρ+γ<0. Using first-order Taylor series approximation (imposing that ρ = 0), KSS propose a unit root test based on the following auxiliary equation with lag-augmentation Equation 4: where, δ = -γθ with the null hypothesis δ = 0 against the alternative: δ <0 . Lag order h, as suggested by Kapetanios et al. (2003), can be chosen by the standard IC or the GS procedure of Ng and Perron (1995). As pointed out in the Introduction, applied works with the KSS test seem to favour the GS procedure. Given that the standard IC is not suitable for integrated series and the GC procedure tends to over-parameterize as far as the ADF test is concerned, the same issues are expected for the KSS test as well. In addition, an often neglected issue is, whenever the GS procedure is used, there are two possible lag selection approaches. The first and natural one is to base the lag selection on the auxiliary KSS regression (4), while the second one is to use a lag chosen in the linear ADF setting (1) for the KSS test (To the best of our knowledge, no applied works (using the GS procedure for the KSS test) set the lag based on the ADF regression).
It is an open question whether these two approaches will generate similar results or not. For clarity, we shall denote GS as the GS procedure based on (1) and N-GS as that based on (4).
On the other hand, the augmentation lag selection can be determined by the MAIC (the preferred criterion for the ADF test) when the KSS test is used. The issue, once again (like in the case of the GS procedure), is that we do not have a clue as to whether the MAIC should be constructed based on the linear ADF setting or the nonlinear KSS setting and whether they would produce different results. If the former is the preferred one, one can simply apply the lag obtained from the ADF test to the KSS test. However, if it is the latter, then we need to build a nonlinear version of MAIC. To this end, we suggest, along the line of the construction of the MAIC based on the DF regression (1), to construct a nonlinear MAIC (N-MAIC) on the basis of the KSS regression (4) as follows Equation 5:  . It is trivial to prove that N-MAIC or Equation (5) holds for the approximate ESTAR model (i.e., Equation (4)). The proof is available from the authors upon request).
In this study, we are particularly interested in the performance of the augmented KSS test when the augmentation lag selection is based on one of the following four different ways: MAIC, N-MAIC, GS and N-GS. While lag selection strategies will not affect the asymptotic distribution of the KSS statistic, the finitesample distribution may be rather different across different lag-selection rules and far apart from the asymptotic distribution. As a consequence, using critical values from the asymptotic distribution in small sample simulations may lead to erroneous conclusions (see a similar issue in the ADF context discussed in Cook and Manning (2002) and Wu (2010)). To resolve the issue, we use the critical values generated from finite (small) samples and the results are shown in Table 1.
In Table 1, via simulations, we tabulate the critical values of the KSS test based on each of the lag selection strategies in finite samples (The critical values are obtained from simulation using GAUSS with 100,000 replications). By comparing the critical values in Table 1 with those of the asymptotic distribution given in Kapetanios et al. (2003 , Table 1), it can be seen that the finite-sample critical values are distant from the asymptotic one, especially when the sample size (T) is small, despite the fact that there is a tendency for the KSS test to converge under different lag selection procedures. Taking T = 100 and Case A (level) as an example, the finite-sample critical values at 5% level are -2.698, -3.058, -2.768 and -3.082 for the KSS tests based on the MAIC, GS, N-MAIC and N-GS, respectively, while the corresponding asymptotic critical value is -2.93. This means that using the asymptotic value may make the test to have a tendency to be over-sized with (N) GS and under-sized with (N) MAIC.

Monte Carlo Results
We report the results of Monte Carlo simulations designed to investigate the size and power performance of the KSS test incorporated with the four different augmentation lag selection strategies, namely, MAIC, N-MAIC, GS, N-GS. For the purpose of comparison, we also report the results from the ADF test (but only linear lag selection criteria, MAIC and GS, are considered) (Note that the critical values for the ADF test are obtained in a way similar to those of the KSS test. To save space, they are not reported in this study but available upon request).        For each simulation, we compute the rejection frequency of the null hypothesis at the 5% level and the sample size is considered for T = 100, 200 under Case A (level) and Case B (trend), respectively. In line with the literature, we set the trend function in the simulations equal to zero since all the tests considered are similar. To alleviate the initial effects, additional 500 observations in each simulated path are generated first but they are not used. All simulations are performed in GAUSS with 20,000 replications.

JMSS
We first examine the size based on two sets of integrated simulated paths y t = y t-1 +u t , t = 1,...,T, one with AR errors: u t = ρu t-1 + ε t and the other with MA errors u t =ε t +ηε t-1 , assuming that ε t is i.i.d. N(0,1). We report the results with AR errors in Table 2. The results with MA errors are presented in Table 3.

An Application to Gross Domestic Product (GDP)
Finally, we report in  Table 5 shows the unit root test results while Panel B reports the results on optimal lag length selection. Table 2, the augmented KSS test is subject to moderate size distortion with AR errors (generally, over-sized with a positive AR coefficient (ρ) and under-sized with a negative one). Besides, size distortion is relatively larger in Case B (trend) than in Case A (level) and the distortion becomes less noticeable as the sample size (T) increases. Among the four different lag selection strategies, the MAIC and N-MAIC tend to show a bit better control in size (closer to the nominal size, 5%) than the GC and N-GC procedures. The size distortion of the ADF test is similar to that of the KSS test. Table 3 can be summarized as follows. When the MA parameter η is positive, the KSS test is slightly undersized if the MAIC or N-MAIC is implemented and slightly over-sized if the GS or N-GS procedures are used. On the other hand, when the MA parameter is negative, size distortion becomes harder to control: size distortion is more serious with a larger MA coefficient (in magnitude) and/or with a trend (i.e., Case B) but is less so as T increases.

As shown in
Among the four lag selection rules, the GS procedure appears to be the worst (with η = -0.8 and T = 100, the size is as large as 0.222 and 0.377 in Case A and Case B, respectively), followed by the N-GS procedure. The MAIC and N-MAIC tend to work better in size control and they are competitive with each other. Interestingly, with a negative MA parameter, the MAIC tends to associate with over-sizing while the N-MAIC with under-sizing. Comparing with the KSS test, in general, the DF test is less size distorted when η>0 but more over-sized when η<0.
We comment on the results of Table 4 as follows. First, the KSS test gains power as θ gets larger (given γ) and as |γ| gets larger (given θ) and as expected, the test is relatively more powerful in Case A (level) with a larger sample size (T = 200). Second, the KSS test with lag selection based on the linear ADF regression is generally more powerful than its nonlinear counterparts (MAIC Vs. N-MAIC and GS vs. N-GS) (Occasionally, the N-MAIC outperforms the GS procedure in terms of power). Third, the MAIC appears to outperform the GS procedure. The MAIC-based KSS test achieves the highest power in most cases in Case A (particularly, when θ is small) and all cases in Case B (As pointed in Kapetanios et al. (2003), when θ grows larger, the series becomes less persistent. This implies that the MAIC-based KSS test is more powerful than its GSbased counterpart but may not be so when the examined series is less persistent). Fourth, the augmented KSS test, in general, has considerable power advantages over the ADF tests in testing against the stationary ESTAR alternatives.
Thus, the KSS test is more powerful than the DF test against the stationary ESTAR models, not only in the special case that sets lag equal to zero (Kapetanios et al., 2003) but also in the cases when the augmented lag selection is implemented. Interestingly, compared to the simulation results of Table 3 in Kapetanios et al. (2003), we notice that there is a power loss in both the KSS and ADF tests with lags. In particular, the ADF test appears to suffer somewhat larger power loss than the KSS test. As a consequence, there are cases where the ADF test is more powerful than the KSS test when the lag is set at zero; but once augmentation lags are considered, the result turns opposite. For example, T = 100 with Case B and (γ,θ) = (-1,0.05), in which the rejection rates are 0.910 (KSS) and 0.934 (DF) when the lag is set at zero (Case 3 of Table 3 in Kapetanios et al. (2003)), the rates become 0.717 (KSS) and 0.547 (ADF) when the MAIC is implemented.
As for the empirical example, Panel B of Table 5 reveals that the optimal lag length of GDP varies very considerably across different OECD countries. It also depends on what lag length selection method is used. As the MAIC-based KSS test is found to have good size and power, we focus our discussion on its result. In particular, MAIC suggests that GDP is nonstationary in most of the countries. The exceptions are Canada, Denmark, Finland, France and Germany where GDP is consistent with a globally stationary ESTAR process. For the purpose of comparison, we also report the ADF test Science Publications JMSS results. In contrast, the ADF tests do not reject the unit root null hypothesis for any of the countries considered.

CONCLUSION
In this study, we provide simulation evidence that shed light on several size and power issues in relation to lag selection of the augmented (nonlinear) KSS test. Two lag selection approaches are considered-the Modified AIC (MAIC) approach and a sequential General to Specific (GS) testing approach. One may use either one of these approaches to select the optimal lag based on either the augmented linear ADF test or the augmented nonlinear KSS test, resulting in four selection methods, namely, MAIC, GS, NMAIC and NGS. The evidence suggests that the asymptotic critical values of the KSS test tends to be over-sized if (N-) GS method is used and under-sized if (N-)MAIC method is used. Thus, we recommend that the critical values should be generated from finite samples. We also find evidence that (N-) MAIC method has less size distortion than (N-)GS method, suggesting that the MAIC-based KSS test is preferred. Interestingly, we also find that the MAICbased KSS test with lag selection based on the linear ADF regression is generally more powerful than the test with lag selection based on the nonlinear version.