An Empirical Study of Robust Modified Recursive Fits of Moving Average Models

: The time-series Moving Average (MA) model is a nonlinear model; see, for example. For traditional Least Squares (LS) fits, there are several algorithms to use for computing its fit. Since the model is nonlinear, Fuller discusses a Newton-type step algorithm. proposed a recursive algorithm based on a sequence of three linear LS regressions. In this study, we robustify Koreisha and Pukkila’s algorithm, by replacing these LS fits with robust fits. We selected an efficient, high breakdown robust fit that has good properties for skewed as well as symmetrically distributed random errors. Other robust estimates, however, can be chosen. We present the results of a simulation study comparing our robust modification with the Maximum Likelihood Estimates (MLE) in terms of efficiency and forecasting. Our robust modification has relatively high empirical efficiency relative to the MLE estimates under normally distributed errors, while it is much more efficient for heavy-tailed error distributions, including heavy-tailed skewed distributions.


Introduction
The Moving Average (MA) model is one of the simplest and easily interpretable time series models. For these models, the observed response is modeled as a linear combination of coefficients and lagged random errors (shocks). Although it is easy to describe, this model is nonlinear in its coefficients and is not that easy to fit. In contrast, the Autoregressive (AR) timeseries model is a linear combination of coefficients and lagged observations. Thus, the AR model is linear in its coefficients and can easily be fit by any regression procedure. (Fuller, 1996) discusses the usual nonlinear, Least Squares (LS) Gauss-Newton algorithm for fitting a MA model as well as the Maximum Likelihood (MLE) under the assumption of normally distributed random errors. (Hannan and Rissanen, 1982) developed a recursive algorithm for the fit of an MA. (Koreisha and Pukkila, 1989) proposed the Innovative Substitution (IS) recursive algorithm for the fit of an MA. This IS recursive algorithm depends on a series of LS fits and is easy to fit. The authors of these papers showed that the estimates obtained by their recursive algorithms are asymptotically equivalent to the MLEs.
In this study, we consider the empirical properties of simple robust modifications to the IS recursive algorithm (Koreisha and Pukkila, 1989). It involves replacing the LS fits of the IS algorithm with robust fits. We call these estimators Modified Innovative Substitution (MIS) estimators. We have selected a robust fit with several properties in mind. We want the robust fit to have high efficiency for normally distributed random errors as well as for contaminated distributions. The lagged error structure in a MA model implies that outliers in the Y-space (response space) will become outliers in the X-space (fitting-space); therefore, high breakdown robust estimates are preferred. Also, in practice, the distribution of the random errors is often skewed, so the robust estimator should have good efficiency properties under both symmetric and asymmetric random error distributions. One such estimator is the high breakdown rank-Based (HBR) estimator developed by Chang et al. (1999). Its properties for AR models were developed by Terpstra et al. (2000). In the empirical study of AR models discussed in (Terpstra et al., 2000), the HBR was one of the best estimators in terms of empirical efficiency over a wide range of random error distributions, including symmetric as well as asymmetric random error distributions.
The main goal of this study is the empirical properties of the MIS estimators, so we have conducted two large Monte Carlo studies on the efficiency of estimation and forecasting, comparing the MIS, IS, and MLE procedures. The studies are over MA models of orders one through four, from small (n = 50) to large (n = 500) sample sizes and for several settings of scale. A variety of error distributions are used, including the normal, contaminated normal, and skewed contaminated normal. Three levels of contamination are used: 10, 20, and 30%. Both Innovative (IO) and Additive (AO) outlier situations are considered. These studies confirm the efficiency of the MIS estimates over this wide range of MA models and random error distributions. In contrast, the traditional estimators, IS and MLE, have poor efficiency over all the non-normal distributions considered. The study also shows that of these two nonrobust estimators, the MLE has better empirical efficiency than the IS estimator over all of the situations.
We used the R package hbrfit for the computations in this study. It is freely downloadable at the site https://github.com/kloke/hbrfit.

Estimates for Moving Average Models
Let Yt, t = 1, 2, ..., n, denote a stationary time-series of length n. In this study, we focus on the Moving Average model of order q, MA(q), which is defined below in expression (4). In this section, we present the Modified Innovative Substitution (MIS) estimators of the coefficients of a Moving Average model (MA). The algorithm that we are modifying for the fit of an MA model depends on an Initial Autoregressive (AR) fit. So we briefly present the AR model first.

Autoregressive Time Series Models
We say that Yt follows a stationary autoregressive time-series model of order p, denoted here by AR(p), if: where, ϕ0, ..., ϕp are the AR parameters that satisfy the stationarity assumption, i.e., the solutions to the following equation: Lie in the interval (−1,1); (Box et al., 2008). Computationally Model (1) is a linear model with Yt as the tth response and the lagged Y's, Yt−j, j = 1, ..., p as the predictors. Hence, robust estimates and subsequent analyses for the AR(p) are easily computed.
There has been considerable work done on robust fits of AR models. For rank-based fits, (Koul and Saleh, 1993) developed the asymptotic theory for these rank-based estimates. Because of the autoregressive model, error distributions with even moderately heavy tails produce outliers in factor space (points of high leverage). With this in mind, the high breakdown rank-based (HBR) estimates of (Chang et al., 1999) seem more appropriate. HBR estimates for the coefficients of AR(p) models are developed in (Terpstra et al., 2000). These articles present the asymptotic theory for the HBR estimates for AR(p) models; useful practical diagnostics including Studentized residuals and diagnostics which differentiate among the HBR and other fits; and the results of Monte Carlo studies which show that the robustness and efficiency of the rank-based estimators hold for finite sample sizes. More discussion of the HBR is presented in the following section.

Estimators for Moving Average Models
The moving average model of order q is defined as: Where, θi,i = 1,2,...,q are the MA model parameters and ϵt are uncorrelated random variables with mean 0 and variance σ 2 < ∞. This can be equivalently written in terms of the backshift operator B as: where: The t th error of this series can be written as: Moving average models are said to be invertible if they can be expressed as an infinite order autoregressive model. This requires that the roots of the characteristic equation: are less than one in absolute value; see, for example, Chapter 2 of (Fuller, 1996). For examples, the MA (1) model is invertible if −1 < θ1 < 1 while the MA (2) model is invertible if −1 < θ2 < 1, θ2 + θ1 > −1, θ1 − θ2 < 1.
As discussed (Fuller, 1996), the MA model is nonlinear in terms of the parameters θi, i = 1, ..., q. Fuller discusses fitting the MA model by using a Gauss-Newton type algorithm. He also discusses an algorithm for computing the Maximum Likelihood Estimates (MLE), assuming normally distributed errors. (Hannan and Rissanen, 1982) developed a recursive linear algorithm for estimating the MA coefficients. (Koreisha and Pukkila, 1989) proposed a similar recursive linear algorithm (INNOVATIVE SUBSTITUTION (IS)). Their recursive method is obtained by applying Ordinary Least Squares (OLS) methods to the MA model after replacing lagged errors with the corresponding lagged residuals from an initial long autoregressive fit. The algorithm is given by the following four steps: 1. Obtain the estimated residuals using the autoregressive ( ) ( ) AR n model of order ( ) n 2. Regress the estimated residuals from the previous step on the observations using the linear regression model 3. Use the estimated regression coefficient from the last step as a θ value to estimate the errors recursively using Eq. (6) 4. The estimates of the errors of Step 3 are plugged into the model as predictor variables and the observations are regressed on these predictors for the final estimates As shown in our simulation study there is very little loss in empirical efficiency between these IS estimates and the maximum likelihood estimates when the errors have a normal distribution. Similar to the MLE estimates, though, our study shows that the IS estimates are quite sensitive to errors with heavy-tailed distributions. To overcome these problems we propose two modified innovative substitution methods. The first Modified IS estimator, (MIS1), consists of replacing the LS fit in Step 4 with a robust fit, while the second, (MIS2), replaces all the LS regressions in the algorithm with the corresponding robust fit. Our empirical results show that these modified estimators yield estimates which lose little efficiency in comparisons to either the IS or MLE fits when the random errors have a normal distribution and, further, they are much more efficient than either of these estimates are when the random errors have distributions with heavier tails than that of the normal distribution.
For the choice of a robust estimator, consider a moving average model when the error distribution has heavy tails. Due to the MA model's model dependency on the past, outliers generated from heavy-tailed distributions become incorporated in both the responses and predictors. Hence, high breakdown estimates are called for. On the other hand, high efficiency is critical. So, high breakdown estimates which retain efficiency are preferable. Finally, our study is also concerned with the behavior of the estimates under skewed heavy-tailed distributions. So robust high breakdown estimators which are efficient under skewed as well as symmetric distributions are desirable.
One estimator that satisfies all three of these properties is the HBR estimator proposed by Chang et al. (1999). This is a weighted Wilcoxon estimator using Schweppetype weights which are functions of both design (predictor) space and response space. In the design space, the weights are functions of the robust distances based on the minimum covariance determinant (MCD) proposed by (Rousseeuw and Van Driessen, 1999), and in the response space, they are functions of residuals based on an initial Least Trim Squares (LTS) fit. As shown by Chang et al. (1999) the HBR estimates achieve a 50% breakdown point. Further, its influence function is everywhere continuous, bounded, and goes to zero in all directions of the spaces of the predictors and responses. So, it is a high breakdown, robust estimator. Since points of "good" leverage are not necessarily down-weighted, the HBR estimates recovery efficiency that is lost by high breakdown fits whose weights depend only on design space; see the results of the empirical study by Chang et al. (1999). The HBR estimator performed well in the empirical study concerning robust estimators for autoregressive timeseries conducted by Terpstra et al. (2000). It was among the best estimators over many situations for heavy-tailed error structure, Including Innovative (IO) and Additive (AO) outliers and skewed as well as symmetric heavy-tailed error distributions.
Our proposed MIS1 procedure is then to use the HBR estimator in the last step of the algorithm instead of the LS estimator. It would seem that replacing all the LS fits in the algorithm with HBR fits would lead to a more efficient estimator in the case of heavy-tailed error distributions. This estimator is our MIS2 estimator.
In the next two sections, we present the results of two Monte Carlo studies comparing the MIS1, MIS2, IS, and MLE estimators. The first study investigates the relative efficiencies of the estimators while the second study compares their forecasts.

Monte Carlo Study of Relative Efficiencies
The purpose of this Monte Carlo study is to determine empirically how well the estimators MLE, IS, MIS1 and MIS2 compare in terms of efficiency over situations determined by the three factors: (I) Moving Average Model, (II) Error Distributions, and (III) Sample Sizes. We discuss the setup of our study and then proceed with the results.

Setup of the Monte Carlo Study
Where, Iϵ has a Bernoulli distribution with a probability of success ϵ; W and Z have N(0,1) distributions; and Iϵ, W, and Z are independent. For the parameters, σ is set at 10, while ϵ is selected from {0.10, 0.20, 0.30}. This is a heavy-tailed error distribution. Based on the lag structure of the MA model, the outliers generated from this distribution become incorporated into the model. These often become "good" points of high leverage. Sometimes these outliers are called Innovative Outliers (IO). This is true of the next distribution, also. SCN: Skewed Contaminated Normal. The random error e is of the form: (1 ) , e I Z I W Where, Iϵ has a Bernoulli distribution with a probability of success ϵ; Z has an N(0, 1) distribution; W has an N(10, 1) distribution; and Iϵ, W, and Z are independent. For the parameters, σ is set at 10, while ϵ is selected from {0.10, 0.20, 0.30} NAO: This distribution contains Additive Outliers (AO) generated as discussed in (Terpstra et al., 2000). The underlying process is an N(0, σ2) distribution where σ is selected from {.5,1,2}. The outliers are generated at a rate of 0.2 n from an N(30,1002) distribution SCNAO: This is also a distribution with additive outliers. The underlying distribution is a skewed contaminated normal distribution described in the situation (SCN). In this case, the outliers are generated at a rate of 0.2 n from a skewed contaminated normal distribution with a mean of 100, the standard deviation of 20, and a level of contamination of 30%

Results
We compare the MLE, IS, MIS1, and MIS2 estimators using the following metric: For each estimator and situation, we obtain the mean square error (MSE). As our measure of efficiency, we report the ratio of the MSE of MLE to MSEs of the estimator (MIS1, MIS2, or IS).
For each situation, this ratio is an estimate of the ARE between the estimators, so we label them as AREs in the tables reporting the results.

Comparisons in Terms of Empirical Efficiency
Tables 1-5 present the results of the empirical AREs of the three estimators' overall distributions and situations in the study. For each situation, these are the ratios of the MSEs of the estimators. The ratio is always of the form MSE of the maximum likelihood estimators (MLE) to the MSE of the estimator (MIS1, MIS2, or IS). Hence, values of a ratio less than 1 favor the MLE while values greater than 1 favor the estimator (MIS1, MIS2, or IS). We discuss the results by distributions.
The results for the normal situations are found in the top panel of Tables 1-4 for the MA (1)-MA (4) models, respectively.
Naturally, the MLE estimator outperforms the MIS1 and MIS2 estimators for these models. Generally, though, the MIS1 and MIS2 estimators have empirical efficiencies above 90%. For example, over all of the normal situations, the MIS1 estimator has empirical efficiency ≥ 92% for 90 of the possible 108 situations. In several situations, the MIS1 estimator has higher efficiency than the IS estimator. In general, in most situations, the MIS1 estimator has a slightly higher efficiency than the MIS2 estimator. Although the efficiency of the IS generally close to 1, the MLE estimator is more efficient for each situation.
The bottom two panels of Tables 1-4 contain the results for the two innovative outlier distributions. The middle panel contains the results for the symmetric contaminated normal distribution. Over all of these situations, the MIS2 estimator is much more efficient than the MLE estimator. Generally, it's between 5 (500%) and 8 (800%) times more efficient than the MLE estimator. For these situations, the MIS2 estimator is always more efficient than the MIS1 estimator, although, generally, the edge is slight. Between the IS and MLE estimators, only in one situation is the IS estimator more efficient. The efficiency of the IS estimator, however, is generally close to 1. The bottom panel of Tables 1-4 displays the empirical efficiencies for the skewed contaminated normal situations. While the trends seen for the symmetric situations are the same for the skewed situations, the MIS2 efficiency is generally larger than in the symmetric case. The MIS2 is generally between 8 to 12 times more efficient than the MLE estimator. The MIS1 estimator is almost as efficient as the MIS2 estimator. Of the two poorest estimators, the MLE is more efficient than the IS estimator. Table 5 displays the results for the Additive Outlier (AO) distributions. The left side the of table displays the efficiencies for the symmetric AO situations. For these situations, the MIS2 estimator is much more efficient than the MLE estimator. In many situations, it is over 100 times more efficient than the MLE estimator. The MIS2 estimator outperforms the MIS1 estimator in every situation. For the skewed AO situations, right side of Table 5 the results are generally the same.                      Case 5   Case  Case

Setup of the Simulation
In our second empirical investigation, we compared the forecasting abilities of the four procedures. For each procedure, we computed one-step ahead forecasts. For example, for n = 50, for each iteration of the simulation of a situation, 51 observations were generated from the model. The observation y51 was considered as the true value and the model was fit using only the first 50 values. Then the forecast of y51 is the predicted value yb51 based on the fit. The empirical bias is the difference . These biases were recorded from which Mean Square Errors (MSE) were computed.
We ran this forecasting simulation for all four estimates: MIS1, MIS2, IS, and MLE. The simulation size for each situation is 1000. The MSEs for all four estimating procedures are recorded in Tables 6-10.

Summary of the Results of the Simulation
Tables 6-10 present the results of the forecasting simulation. Each table displays, over its situations, the MSE of the estimator's (MIS1, MIS2, IS, or MLE) predicted values from the "true" values. We briefly discuss these results.
The top panel of Table 6 displays the MSEs for the normal situations of the MA(1) models. Overall, the MLE's prediction performs best in terms of MSE; but, in 6 of these situations, the MIS2 prediction has lower MSE than the MLE prediction. In the other situations, the MSEs of the MIS2 procedure are quite close to those of the MLE procedure. The MSEs for the IS and MLE procedures are quite close, but in every normal situation, the MLE's predictions outperform those of the IS predictions. For the two robust procedures, in every situation, the MSEs of the MIS2 predictions are lower than those of the MIS1 predictions. The results are the same for the MA(2), MA(3), and MA(4) normal situations found in the top panels of the respective Tables 7-9. The MLE and MIS2 procedures outperform the IS and MIS1 procedures, respectively, while the predictions of the MLE procedure outperform the predictions of the MIS2 procedure. Note that the MSEs tend to increase as σ increases and as the MA coefficients increase.
The bottom two panels of Tables 6-10 contain the results for the innovative outlier situations for the MA(1)-MA(4) models, respectively. The results for the symmetric contaminated normal situations are found in the middle panel. The MIS2 and MIS1 predictions are much more precise than those of the IS and MLE procedures for each of these situations. The ratio of the MSEs, MLE to MIS2, ranges from 5 to 10 across all MA models. Of the two, the predictions of the MIS2 procedure are slightly more precise than the MIS1 procedure. Although both the MLE and IS forecasts are poor in terms of efficiency in these situations, the MLE predictions are more precise than those of the IS procedure. The MSEs for all methods tend to increase with increasing contamination.
The results are similar for the skewed contaminated normal situations in the bottom panel of these tables. The predictions of the MIS2 and MIS1 procedures outperform those of the IS and MLE procedures with similar ranges of precision. In terms of precision, the best predictions are by the MIS2 method but the MIS1 is only slightly less precise.

Example
A real data set from (Box et al., 2008) is used as an example to compare the forecasts made by the MIS1, MIS2, IS, and MLE procedures. This time series is the daily common stock closing prices of IBM (Series B) from May 17, 1961-November 2, 1962. It consists of 369 observations. Box and Jenkins identified the series as an IMA(1,1) model, i.e., a first difference MA(1) model. We used the CRAN package tsoutliers to detect outliers in the series; see (de Lacalle, 2017) and (Chen and Liu, 1993). Three outliers were detected with IDs: 239, 258, and 270. The first outlier is a Temporary Change (TC) while the other two are Additive Outliers (AO).
We fit the first difference MA(1) model to this series using all four procedures: MIS1, MIS2, IS, and MLE. Table 11 displays the estimates of the MA(1) coefficient θ and the variances of the residuals for the procedures. We also computed one-step ahead forecasts of the last data point and the mean square difference between each forecast and that point.
The estimates of the MA(1) coefficient θ are similar. The MIS residual variances estimates are similar but both are much less than the residual variances of the IS and MLE fits. So the MIS fits are more precise than the IS and MLE fits. The forecasts of the four procedures are similar, also. The mean squared difference of the MIS procedures is slightly less than those of the IS and MLE procedures.

Conclusion
In this study, we have presented two robust modifications of the recursive Innovative Substitution (IS) algorithm for the fitting of the Moving Average (MA) time-series of order q. The IS algorithm is a series of Least Squares (LS) regressions. The first modification that we proposed, MIS1, replaces the last regression in the IS algorithm with a robust fit, while the second, MS2, replaces all the LS fits with robust fits. For the robust fit, we selected the High Breakdown (HBR) fit. This estimate has bounded influence in both the Y and X spaces, attains a 50% breakdown point, and is efficient for skewed as well as symmetrically distributed random errors.
The main goal of this study is to investigate the empirical properties of these modified IS procedures. Two large Monte Carlo studies were carried out to investigate respectively the efficiencies of the estimates and forecasts over a wide variety of MA models, MA(1) and MA(4), comparing them with the IS and MLE procedures. Random error distributions included the normal and symmetric as well as skewed contaminated normal distributions with various rates of contamination. These were formulated to produce both Innovative (IO) and Additive (AO) outliers.
Of course Not surprisingly, for normal situations, the MLE estimator was the most efficient. In most normal situations, though, the efficiencies of the MIS1 relative to the MLE exceeded 92%. For these normal situations, in general, the MIS1 was slightly more efficient than the MIS2, while the MLE was slightly more efficient than the IS estimator. For the symmetric contaminated normal situations, the MIS2 estimator is generally 5 (500%) to 8 (800%) times more efficient than the MLE estimator. In skewed contaminated normal situations, the edge increases to 8-12 times more efficient than the MLE estimator. These are IO situations. For the AO situations, the MIS2 estimator is even more efficient. In many of these situations, it is 100 times more efficient than the MLE procedure. The MIS1 has high efficiency to the MLE estimator in all non-normal situations, also. The MIS2, however, is always more efficient than the MIS1 estimator. While the MIS2 estimator dominates the MIS1 estimator in terms of efficiency in non-normal situations, the edge is much less than in its comparisons with the MLE estimator. The results are essentially the same regardless of sample size.
For normal situations, the MLE forecasts are more efficient than those of the other procedures but its edge over the MIS forecasts is slight. In terms of forecasting, the MIS2 forecasts are slightly more efficient than those of the MIS1 procedure. For non-normal situations, the MIS forecasts are much more efficient than those of the MLE. For the AO situations, this edge increases, often to 100-fold. The MIS2 procedure dominates the MIS1 procedure in all situations but as in the normal case, the edge is slight.
The IS recursive algorithm for fitting a MA model is a series of LS regressions. In this study, we considered the simple modification of replacing these LS regressions with regressions based on a high breakdown, efficient estimator. We proposed two modifications, MIS1 (replaces only the last LS regression) and MIS2 (replaces all the LS regressions). In our simulations studies, both modifications showed high empirical efficiency for MA models with normally distributed random errors compared to the MLE estimator. For symmetric and skewed contaminated normal situations (IO and AO outliers), the MIS estimators were much more efficient than the MLE estimators. The results for our forecasting study were similar. For the non-normal situations, the MIS2 estimator was more efficient than the MIS1 estimator while for the normal error situations the MIS1 has a slight edge. In practice, of the two, we recommend the MIS2 estimator.