On Three-Way Unbalance Nested Analysis of Variance

Abstract: Problem statement: In this study, we give a simple analytically tract able procedure for solving three-way unbalanced nested Analysis of Var iance (ANOVA). In many realistic situations, unbalanced design was unavoidable due to natural co nstraints and missing data. Approach: Here, we present a comprehensive approach for addressing sol utions of problems arising from unbalanced nested ANOVA. We consider the F-statistics under th different models. Results: Special attention was given to the construction of approximate F-test where exact F-test does not exist. Pseudo-degrees of freedoms were derived using the Satterthwaite’s type approximation. Conclusion: In all derivations, we assume that the effects act independently and th at the mean squares are independent. A numerical example is given to illustrate the solution procedu r .


INTRODUCTION
This study is intended to be a tutorial for those wishing to inform themselves about three-way unbalanced nested ANOVA. It focuses on the background understanding of model parameters estimation, derivation of sum of squares of effects, derivation of expected mean squares of effects, setting up of variance components and construction of approximate F-statistic where exact F-test does not exist. Certainly, these tools are not new, but insight into how they are applied to three-way unbalance nested ANOVA would open new vistas in our way of handling unbalanced hierarchical arrangements. The study is made to be analytically and computationally accessible. Readers need only some prior knowledge of two-way balanced nested design; see, for example, Montgomery (2008) and Dowdy and Chilko (2004) for discussion on balanced Nested Design.
An experiment with three factors A, B and C is said to be three-way unbalanced nested design if one factor say B is nested within another factor say A and factor C is nested within factor B such that each A level has b i B levels, each B level has C ij C levels and n ijk observations are drawn from each C level. It is pertinent to mention that this arrangement does not permit interaction between factors. Analysis of variance layout is called unbalanced if it has unequal subclass numbers. In addition, analysis of variance model is said to be unbalanced if the variance of the difference between any two treatments is not a constant but depends on the treatments.
Many experimental situations could lead to nested arrangement. Thus, this kind of design has found extensive applications in industries, biological sciences, clinical studies. The analysis of nested design is difficult and the problem is complicated when faced with unbalanced nested design. Most statistical software now incorporate commands and guidelines for carrying out computations on unbalanced nested analysis of variance. Kashiani and Saleh (2010) discussed three methods of estimating varaince components for mixed-model. However, there are several methods for estimating variance components when the design is unbalanced. Each method influences the corresponding approximate F-test. Seeger (1970) gave a method for estimating variance components in unbalanced design. He used unweighted mean in his estimates and showed that these estimates are unbiased. Bush and Anderson (1963) considered numerical comparisons between variances of components of variance due to different methods. Tietjen and Moore (1968) developed a fast procedure for computing approximate F-test in unbalanced nested analysis of variance. Sahai and Ojeda (2004) discussed unbalanced nested analysis of variance for random effect model.
This study aims at providing background knowledge on three-way unbalance nested analysis of variance in a simple, straightforward, self-contained account of the underlying theory. That is, the study exposes the analytical procedures that are hidden when analysis is performed with the aid of statistical software.
Unbalanced nested design could arise from a number of factors which include missing data; lose to follow-up or subjects get sick, limited resources and natural barriers. Missing data can result from overt errors in measurements, patients not showing up for scheduled visits in a clinical trial, loss of samples, (Bolton and Bon, 2009).

MATERIALS AND METHODS
Statistical model: The linear model for three-way unbalanced nested arrangement is given by Sahai and Ojeda (2004) as Eq. 1: Where: y ijkl = The l th observation within the kth level of factor C within jth level of factor B within ith level of factor A µ = The overall mean α i = The effect due to the ith level of factor A β j(i) = The effect due to the jth level of factor B nested within the i th level of factor A λ k(ij) = The effect due to the kth level of factor C nested within the jth level of factor B nested within the ith level of factor A e ijkl = The residual error of the observation y ijkl The following restrictions are imposed on Eq. 2: Note that: where, ℂ is any constant. Restriction (2) allows for estimating the model parameters using the least squares method.

Estimation of model parameters and sums of squares:
The Least squares method has been widely used in estimating model parameters, see, for example Naisipour et al. (2008); Kavitha and Duraiswamy (2011) and Rencher and Bruce (2008). The least squares method and restriction (2)  We refer the reader to appendix II for the derivation of the expected mean squares under the different models.

Fixed effect model (Model I):
Factor A is fixed, factor B is fixed and factor C is fixed.

Assumptions of the model:
The assumptions of the model are given in Eq. 3: The expected mean squares are:

Mixed effect model (Model III A ):
Factor A is fixed, factor B is random and factor C is random. Assumptions of the model Eq. 5: The expected mean squares for the model are: The expected mean squares are given below: The expected mean squares for the model are:

Mixed effect model (Model III AC ):
Factor A is fixed, factor B is random and factor C is fixed. Assumptions of the model Eq. 8: The expected mean squares for the model are: where, " " ≙ reads "estimated by". This procedure is similar to Henderson's Method I. Sahai and Ojeda (2004) discussed Henderson's methods for estimating variance components for unbalanced data: The same procedure is used to obtain the variance components for the different models and the results are given below.

Mixed effect model (Model III A ):
Factor A is fixed; factor B and factor C are random.
The variance components are:

Mixed effect model (Model III B ):
Factor B is fixed, factor A and factor C are random.
The variance components are: Mixed effect model (Model III C ): Factor C is fixed, factor A and factor B are random.
The variance components are:

Mixed effect model (Model III AC ):
Factor B is random, factor A and factor C are fixed.
The variance components are: Test proceddures for the models: The expected mean squares help in the determination of appropriate test statistics for testing hypotheses about the effects. Expected mean squares determine which hypotheses are tested by each mean square. An F-statistic can only be formed when, under appropriate hypothesis, two expected mean squares have the same value (Mason et al., 2003). In this study, we shall consider the test statistics under the different models. The variance estimates are:

Model I (Fixed-effect model):
We now show that: The approximate F-statistics: The hypothesis H 0A :σ α = 0 is tested by: where, F A is approximately F-distributed with a-1 and f θ degrees of freedom. Where: Derivation of pseudo-degrees of freedom, f θ θ θ θ : We construct the degrees of freedom f θ by applying an approximation due to Satterthwaite (1946). A Satterthwaite approximation is based on assuming that a variance estimator has a chi-square distribution and solving for the implied degrees of freedom, using the method of moments (Valliant and Rust, 2010): If we assume that MS B and MS C are independent, then Eq. 9 and 10: Recall that: By extending our idea of (11) in (9), we get: Finally, the hypothesis H 0C :σ λ = 0 is tested by The hypothesis H 0A :σ α =0 is tested by: where, F A is approximately F-distributed with a-1 and f θ degrees of freedom, where: The hypothesis H 0B :β j(i) =0 is tested by:

RESULTS AND DISCUSSION
Numerical example: We give illustrative example with hypothetical data on the hardness (crushing strength) of a particular tablet produced by a pharmaceutical company. The company has two production sites within a particular region. Two of the machines for producing the tablet are randomly selected from site one and three machines from site two. Based on the production capacity of each machine, two batches of the produced tablets are randomly selected from each of the machines at site one. At site two, two batches are selected from machine one, three batches from machine two and one batch from machine three. The measures of the crushing strength of the tablets randomly selected from each of the batches are recorded in Table 1 below. The company wants to investigate if the batch to batch variability within machines, machine to machine variability within sites and site effect are significant sources of variation in the crushing strength of the tablet. This is a case where the site is fixed, machines and batches are random.
We now proceed to construct the n ijk -table (Table  3) to make the computation easier. The n ijk -table is table of counts of the number of observations for the i th Site, j th Machine and k th batch.

Computation of sum of squares:
Let: The results of these computations are summarized in Table 3           The values of the approximate F-tests for factors A and B are obtained as follows: On the basis of the results shown in Table 3, we conclude that there is no difference in Sites, there is no significant difference in machine-to-machine variability within the Sites and there is no difference in batch-tobatch variability within the machines at one percent and five percent significant levels, respectively.

Mixed effect model (Model III
We obtained the equivalents of ijk. y and ij.. y using (1) and (5)