A Simple Robust Control Chart Based on MAD

Control charts are one of the most powerful tools used to detect aberrant behavior in industrial processes. The Shewhart S-control chart is one of the most widely used statistical process control techniques developed to control the process variability based on the fundamental assumption that the underlying distribution of the quality characteristic is normal. When the underlying normality assumption is not met, the robust methods are one of the most commonly used statistical methods preferred in such situations. We presented a simple approach to robust estimation of the process standard deviation σ based on a very robust scale estimator, namely, the median absolute deviation from the sample median (MAD). The proposed method provides an alternative to the Shewhart Scontrol chart. A numerical example was given and a Mote Carlo simulation study was conducted to illustrate the performance of the proposed method and compare it with that of the Shewhart method. The proposed robust method was lead to a better performance than the Shewhart method and has good properties for the heavy tailed distributions and moderate sample sizes.


INTRODUCTION
Scale estimators are very important in many statistical applications. The most common of scale estimators is the sample standard deviation, S. Many statistical quality control textbooks recommend the use of it for estimating the process standard deviation σ for a normally distributed random variable. Unfortunately, S is not necessarily the most efficient or meaningful estimator of scale in skewed and leptokurtic distributions and it is notably that S is non-robust to slight deviations from normality [1] . The most appropriate measure of scale often depends on estimator efficiency, performance of inferential methods in realistic settings and effectiveness in describing the most interesting aspect of distribution variation. The sample standard deviation, S, has good efficiency in platykurtic and moderately leptokurtic distributions but the classic inferential methods for S may perform poorly in realistically nonnormal distributions [2] . Therefore, we are looking for a scale estimator which is robust, explicit and easily computable as an alternative to S. The median absolute deviation from the sample median (MAD) is a more meaningful measure of variation and may be preferred to the sample standard deviation, S, in certain nonnormal distributions.

Shewhart S-control chart:
The Shewhart S-control chart which is one of the most widely used statistical process control technique developed to monitor the standard deviation of a production process σ in order to control the process variability. It is use as a standard practice to estimate σ the average of the subgroup standard deviation, 4 S c σ = , where the constant c 4 is needed to make the estimator an unbiased estimator for σ. The fundamental assumption of the S-control chart is that the underlying distribution of the quality characteristic is normal, but unfortunately many processes, occur in practice, do not follow the normal distribution and due to the fact that the sample standard deviation, S, is non-robust to slight deviations from normality assumption, the need for alternatives to the Shewhart S-control chart comes to play.

Robust Methods:
The robust methods are one of the most commonly used statistical methods when the underlying normality assumption is violated. These methods offer useful and viable alternative to the traditional statistical methods and can provide more accurate results, often yielding greater statistical power and increased sensitivity and yet still be efficient if the normal assumption is correct. By a robust estimator, we mean an estimator which is insensitive to changes in the underlying distribution and also resistant against the presence of outliers. The robust estimator is considered to be good if it has high efficiency, high breakdown point which is the maximum fraction of outliers an estimator can cape, redescending influence function which measures how an estimator reacts to a small fraction of outliers and has low gross-error sensitivity which measures the worst influence a small amount of contamination of fixed size can have on the value of the estimator [3,4] . In this paper, we will restrict attention to estimator that have an explicit formula, always yield a unique estimate, have a 50% breakdown point, a bounded influence function and being easily computable and needs little computation time.

Sample Standard Deviation:
The sample standard deviation, S, is the most commonly used measure of scale. It is defined as the square root of the mean of the squares of deviations from the common sample mean, X . The main advantage of the sample standard deviation, is that, it can be regarded as truly representative of the data, since all data values are taken into account in its calculation, while the main disadvantage, is that, it is non-robust to slight deviations from normality and can be easily influenced by outliers. The breakdown point of the sample standard deviation for a sample of size n is merely 1/n, that is, it can be destroyed by even a single outlier [5] .

Median Absolute Deviation:
The median absolute deviation from the sample median (MAD) is considered one of the good robust estimators for scale which satisfies the above combination of requirements. Due to the good properties of this estimator, it will be used as an alternative to the sample standard deviation, S, in estimating the process standard deviation, σ, to construct a simple univariate robust control chart when the assumption of normality for the Shewhart S-control chart is not met.

MATERIALS AND METHODS
The median absolute deviation from the sample median (MAD) is a very robust scale estimator than the sample standard deviation. It measures the deviation of the data from the sample median. It promoted first by Hampel [6] who attributed it to Gauss. This estimator is simple and easy to compute. The MAD is often used as an initial value for the computation of more efficient robust estimators. The MAD for a random sample of size n observations x 1 , x 2 ,...,x n is defined as follows: where MD is the sample median. The statistic b n MAD is an unbiased estimator of σ if the random sample x 1 , x 2 ,...,x n are normally distributed [7] . The correction factor b n is given in Table 1 for different values of n. The main properties of the MAD, is that, it has a maximal 50% breakdown point which is twice as much as the interquartile range, IQR. The influence function of the MAD is bounded by the sharpest possible bound among all scale estimators. The gross-error sensitivity of the MAD is equal to 1.167 which is the smallest value that one can obtain for any scale estimator in the case of the normal distribution. The MAD efficiency at normal distributions is equal to 37%. Let x i1 , x i2 ,..., The values of the control limit factors c * 4 , B * 5 and B * 6 are calculated and given in Table 1. Now, after the LCL and UCL and the central line CL, are calculated, the values of S i , i = 1, 2,...,m, are plotted on the chart. If any of the plotted S i 's is falling outside the control limits, the process is considered to be out of control.

RESULTS AND DISCUSSION
Numerical example: A randomly generated data from the normal distribution N(0,1) is used to illustrate the two methods. The data consisting of m = 30 subgroups of size n = 10 observations each. The summary information regarding the control limits, central line and length for the S and the proposed robust MAD control charts are calculated and given in Table 2 where we can note that both methods lead to the same state of control and approximately same length with wider length in the case of the proposed robust method. Only minor differences in the calculated control limits are observed and this is to be expected since the data used are normally distributed. Figure 1 indicates that the process for the both types of the control charts is out-of-control.

Monte Carlo Simulation Study: As indicated by
Langenberg and Iglewicz [8] , the processes generated by distributions with heavier tails than the normal are of special interest. Such processes tend to have more than the expected number of points falling outside the control limits. It would thus be instructive to study the consequences of using the proposed robust method for nonnormal observations. Therefore, to evaluate the performance of the proposed robust control chart a computer programs written in FORTRAN are used. The performance of the proposed robust control chart is compared to that of the Shewhart S-control chart under   Table 3, while the variance values of them are given in Table 4. These values are used in determining the control limit values given later. It is not proper to compare the measures directly, since each measure something different. From the results, the sample standard deviation, S, has always the largest     variance for all distributions and all sample sizes considered. Even that, the difference is not so big. Table 5 presents the confidence interval width and Table 6 presents the number of points falling outside the control limits for each method. From the results, we note that for the case of the normal distribution, both methods leads approximately to the same length and also all have the same number of points falling outside the control limits for different sample sizes. As the tail weight of the distribution increases, the proposed method leads to shorter control limits and more number of points falling outside the control limits.

The Average Run Length Simulation Study:
The criterion used to evaluate the performance of a control chart is how quick it can detect the change in the process and this can be obtained by calculating the Average Run Length (ARL) which is the number of points plotted within the control limits before one gets out. Under the normality assumption and for shewhart control charts like S, it is expected that 370.4 points would be plotted on the chart within the 3σ control limits before one gets out.  The In-Control Average Run Length: If the process is in-control, we want the in-control average run length, ARL 0 , to be large. Let N (µ, σ 2 ) be a normal distribution, then sets of m = 20 subgroups consisting of n = 5 and n = 10 observations were generated from N (0, 1) distribution. The control limits for the Scontrol chart and the proposed MAD control chart were constructed. After determining the control limits, random N (0, 1) subgroups of size n were generated. The S statistic was computed for each subgroup and compared to the control limits of both control charts. The number of subgroups required for the value of the S estimator to exceed the control limits was recorded as a run length observation, RL i . For runs not signaling by the 25,000 th subgroup, the run length recorded as 25,000. The same procedure is used to compare the Shewhart S-control chart and the proposed MAD control chart for the logistic, double exponential and the Cauchy distributions. The control charts robust to the assumption of normality should exhibit relatively stable ARL 0 for the four different nonnormal distributions. This process is repeated 1000 times and the results for this simulation study are given in Table 7. The ARL 0 was calculated as  Table 7, we noticed that the ARL 0 of the proposed robust control charts seems to be stable for all studied distribution functions where the values of ARL 0 are very closed. The performance of the ARL 0 of the Shewhart S-control chart is affected by the type of the distribution function of the data, where there is a difference between the different values. In general the performance of the proposed robust method based on MAD estimator is better and very close to the nominal value 370.4.

The out-of-control average run length:
If the process is out-of-control, we want the out-of-control average run length, ARL 1 , to be small. The control limits for the ARL 1 are based on N (0, 1) distribution. The observations used to calculate the statistics S are from a normal distribution with mean µ = 0 and standard deviation λσ, where λ representing the size of the shift in the standard deviation. The shifts in the standard deviation are from σ to λσ. Without loss of generality,         Table 8-11 which show the values of ARL 1 for the proposed robust MAD and Shewhart S control charts, we conclude that when the data is normal, the robust control chart is only slightly less efficient than the corresponding Shewhart S control chart to detect shifts for all considered values of λ. However, this loss in efficiency is small when the sample size gets large. In such cases, it is recommended to use the Shewhart S control chart in the presence of normality. When the data is selected from a heavy tailed distribution as in the case of the double exponential and logistic distributions, the ARL 1 of the proposed robust control chart is generally smaller than that for the Shewhart S control chart. The proposed robust control chart has the ability to detect shifts more quickly than the S control chart for all values of λ and sample sizes considered. Finally, when the data is selected from extremely heavy tailed distribution as in Table 10: The values of ARL1 for S and the robust MAD control  charts from the DE distribution  Sample size  ---------------------------------------------------------------5 10    the case of the Cauchy distribution, the ARL 1 of the proposed robust control chart is still in general smaller than that for the Shewhart S control chart for all considered cases. Therefore in the case of a heavy tailed distribution where a nonnormal distribution is present, it is recommended to use the proposed robust control chart as an alternative to the Shewhart S control chart.

CONCLUSION
This article has presented a simple alternative robust univariate variable control chart for monitoring the process variability and the necessary table of factors for computing the control limits and the central line. The proposed robust control chart is based on the median absolute deviation from the sample median (MAD) which used for estimating the process standard deviation, σ, in the case of the Shewhart S control chart. The results of the numerical example show that the proposed robust method leads approximately to the same performance as the Shewhart method in the case of the normal data. The proposed method leads to wider control limits than the Shewhart method. True robust limits should be slightly larger in the case of the normal distribution [9] . The Simulation studies show that the proposed robust method has good properties for heavy tailed distributions and moderate sample sizes, where it leads to better performance than the Shewhart method, especially for the heavy tailed distributions, which seems to support the fact that these control charts are preferred in such situations. Finally, while we give an alternative to the Shewhart S control chart based on MAD, other robust estimators alternatives to the MAD such as S n and Q n estimators proposed by Rousseeuw and Croux [7] can be applied to monitor the process variability and compare its performance to the method proposed in this study and to the Shewhart S control chart.