A Note on Model Selection in Mixture Experiments

In mixture experiments, determination of the best model for modeling the mixture system is significant in both understanding and interpreting the system. For obtaining the best model in mixture experiments, different methods have been used. Most commonly used methods are the stepwise type methods. However, the models obtained with these methods are not always the best model depending on the chosen criteria. As the models obtained with these methods can be affected by collinearity, in this paper, an alternative approach is used for the determination of the models taken into account in the modeling of the mixture surface, which is obtained on the experimental region. This approach depends on the examination of all possible subset regression models obtained for the mixture model. To determine the best subset model, the condition numbers of models and the model control graphs are also taken into account. Then, proposed approach has been investigated on flare data set, which is widely known in literature.


INTRODUCTION
In mixture experiments, the measured response is assumed to depend only on the proportions of ingredients present in the mixture and not on the amount of mixture. For example, the response might be the tensile strength of stainless steel which is a mixture of iron, nickel, copper and chromium, or, it might be octane rating of a blend of gasoline. The purpose of mixture experiments is to build an appropriate model relating the response(s) to mixture components.
All of the work on mixture models has been based on response surface concepts. A model is fitted to data by an experimental design. Various mixture models can be used in the analysis of mixture experiments. However, the determination of the best model for modeling the mixture system is important in both understanding and interpreting the mixture system since the fitted models are used to screen the components, predict the response(s), determine the effects of components on the response(s), or optimize the response(s) over the experimental region.
In general, computer-based methods for choosing the best subset regression have been suggested for determining the best model in mixture experiments. Cornell [1] , Piepel et al. [2][3][4] and Martin et al. [5] used the stepwise regression for choosing the model in mixture experiments. In addition, Draper and St. John [6] used the backward elimination regression procedure. Different from these methods, there are many various methods which examine all possible subset regression models. One of these methods is "RSQUARE procedure" in SAS. This approach was used in mixture experiments, by Khuri [7] and Cornell [1] . Khuri revised the work done by Cornell [1] and he gave some collinearity diagnostic measures for each p-parameter submodel with the highest 2 R value. However, for each submodel with p-parameter, the models including different terms should not be ignored as an alternative to the model with the highest 2 R value. Using the model with the highest 2 R value may not be suitable for the interpretation of the mixture model, as it can be affected by collinearity compared to other models.
The purpose of this study was to get attention on the results obtained by using an alternative approach, in choosing a model in mixture experiments. This approach depends on the examination of all possible subset regression models obtained for the mixture model. By comparing all possible subsets, an investigator can not only determine the best reduced models according to the selected criteria such as 2 A R , but also identify alternatives to the best ones. In addition, extra criteria will also be taken into account for the determination of alternative models. In this way, with the help of models including different interaction terms, the mixture system can be interpreted much better and the role of components in the system can be understood much easier.

Mixture experiments:
A mixture experiment involves mixing various proportions of two or more components to make different compositions of an end product. In a q-components mixture in which i x represents the proportion of the ith components present in mixture, The composition space of the q components takes the form of a regular ( ) Physical, theoretical, or economic considerations often impose additional constraints on individual components 0 1 1, 2,...,  [8] .
It is assumed that the response or property of interest, denoted by η , is to be expressed in terms of a suitable function f of the mixture variables i x , A typical model may thus be written where i ε is assumed that model forms most commonly used in fitting data are the canonical polynomials introduced by Scheffé [9] in the form For modeling well-behaved systems, generally the Scheffé polynomials are adequate. For some situations, however, there are better modeling forms than Scheffé polynomials which could be used. For example, as an alternative to Scheffé mixture models, models including inverse term are used in order to model an extreme change in the response behavior of one or more components, which are close to boundary of the simplex region [6] . Following, quadratic model including an inverse term has been proposed by Draper and St. John, Scheffé polynomial models fails to satisfy the modeling of additive effect of one component and at the same time accommodate the curvilinear blending effects of the remaining components. To model these effects jointly, Becker has developed a set of mixture models which are homogeneous of degree one [10] . They provide alternatives to the Scheffé polynomials. Becker's three second order models are of the form 12 In the H2 model, As usual, we can represent the Scheffé canonical polynomial models, mixture models with inverse terms and Becker Homogenous models in matrix form by where p is number of terms in the model, β is the 1 p × vector of parameters to be estimated and ε is 1 n × vector of errors. It was assumed that the errors have the property where n I is identity matrix and comprehensive reference on the design and analysis of mixture data is given by Cornell [11,12] .

Determination and comparison of mixture models:
In mixture experiments, reduction of the model is as important as choosing between different mixture model forms, since it is not a very good approach to add all the terms of the chosen model to itself. In a situation like this, the model may include correlated terms. It may also be hard to make comments on the mixture system as the parameter values may be affected.
In mixture experiments, determination of the best model for modeling the mixture system is significant in both understanding and interpreting the system. There are various methods for choosing a regression model such as forward selection, backward elimination and stepwise regression when there are many candidate model terms. In addition, Cornell [11] mentioned that the stepwise regression model can be investigated for various models in mixture experiments. The objective is to obtain a model form that not only contains an adequate amount of information about the mixture system under investigation but whose form also makes sense. There are serious problems with stepwise type methods since they do not give the best model (based on the selected criteria, for example 2 A R ). This is because they handle variables one at a time. In addition, only one model is obtained with these methods. Therefore, there is a possibility of missing better models.
Mixtures problems are particularly prone to illconditioning (or collinearity) because of constraint (1). Collinearity between the terms may lead to inconsistent or confusing conclusions when comparing the result of different stepwise regression procedures. This inconsistency makes variable selection a potentially misleading process when collinearity is present. Illconditioning may not be a problem if the goal is prediction within the experimental region. However, illconditioning can be a serious problem if interpretation of the coefficients is the objective [8] .
Standard variable selection methods do not perform well when the data is highly multi-collinear. For this reason, in order to obtain the best reduced model, all the possible subset regression models should be examined. The sequential model fitting methods proposed by Draper and St. John [6] for mixture experiments can be useful. But, if there are many terms, it can require too much labor. A more preferable method than these methods is to fit all possible regression models, and to evaluate these according to some criterion. In this way a number of best regression models can be selected.
In order to find the best subset regression model "RESEARCH procedure" on GENSTAT was used [13] . While using this procedure, three criteria will be taken into account in determining the best models. First of all, linear mixture terms ( ) 1 2 , ,..., q x x x were kept in the model and all possible combinations for the rest of the terms were added to the linear mixture terms. The reason for keeping the linear mixture terms in the model is that the model proves the hierarchy principle. Hierarchy principle is important for the equivalence of the models [7] . As an addition to linear mixture terms, the number of models with the term t ( ) Therefore, for different t values in the mixture system, as a total of ( ) 2 1 p q − − subset regression models will be obtained. For the models obtained, the terms which have p value − smaller than 0.05 according to Fstatistics are meaningful. However this situation can affect because of the collinearity and therefore, some important terms for the mixture system can be ignored. In this situation, instead of taking models with meaningful terms into account, the VIF values of the terms should be taken into account. The condition numbers of the models can also be used for comparing the reduced models. A useful measure of collinearity is the condition number, κ , defined by where max λ and min λ denote the largest and the smallest of the eigenvalues of ′ X X (the columns of X have been scaled to unit length), respectively. Smaller condition numbers indicate more stability (better conditioning) in the least squares estimates than indicated by larger condition numbers. In this study, subset regression models with a condition number less than 40 will be taken into account. In some situations, the condition number of the model with the highest 2 R can be greater than those for other models and this affects the parameter values which may cause misinterpretations about the system. The condition number being less than 40 does not guarantee that VIF values of the terms are less than 100. For this reason, the models with condition numbers less than 40 and the terms with VIF values less than 100 will be taken into account.
Thirdly, in order to examine which of the models are adequate, model control graphs should be obtained. For the models whose model control graphs are adequate, a final decision can be made by looking at 2 A R and MSE values of the models. The proposed approach will be examined in the following part over the flare data set. [14] presented an example to illustrate their extreme-vertices design.   The component proportions for design points as well as the measured illumination values are given in Table 1. Snee [15] used Homogenous mixture models (7) for the modeling of the flare data set. Draper and St. John [6] made a comparison of the mixture models (6) and (7) for the flare data set. On the other hand, Piepel and Cornell [16] gave a summary of the models proposed for the flare data set till now. When these models are examined, it can be seen that they have three terms as an addition to linear mixture terms and they also have the highest 2 R values. However, as the experimental region is restricted, the parameter predictions are affected due to collinearity. For this reason, comparing the models obtained to their condition numbers and using the models with small condition number are more accurate for interpreting the system. On the other hand, Snee [15] and Draper and St. John [6] considered pseudocomponents for flare data set. In this paper, subset regression model for actual components will be given by using Scheffé, Homogenous H2 and Models including inverse term.
In addition, when the stepwise regression and forward selection is used for the Scheffé model taken into account, the model with only 2 3 x x is obtained but with backward elimination method, model with 1  When the homogenous H2 model is examined, five models with one term and condition numbers less than 40 are obtained. These models include the terms except for ( ) x x x x + is obtained by using backward elimination method. In contrast to this, with forward selection and stepwise regression a model with x x x x + is obtained. The condition numbers for these models are 45.3 and 12.7, respectively. The model control graphs of the Scheffé and Homogenous H2 subset regression models obtained show that these models are not adequate. Now let's take mixture models including inverse term into account. If the model including inverse term is from the first degree, then the models with condition number less than 40 have 1 2 x − , 1 3 x − and 1 4 x − with condition numbers 16.8, 16.7 and 39.4, respectively. As the VIF value of 1 4 x − is 106, this model was ignored. This model with highest 2 R value was also used by Piepel and Cornell [16] . The VIF value of 1 1 x − in the model is 3587.9 and the condition number is 179.2. Therefore, the models that can be recommended for the model including inverse term are only the models with 1 2 x − and 1 3 x − from the models with one term. The summary statistics of these models are VIF values are equal and 99.3 for the 3 x term in the first model, and the 2 x for the second model.
Therefore, the models that can be recommended for the second degree models including inverse term include the terms ( ) x x x − and ( ) , x x x − is obtained by using backward elimination, forward selection and stepwise regression. In this model, VIF value for 1 3 x x is 261.6 and condition number is 47.4. The models including inverse term are better for the interpretation of the mixture system than Scheffé and Homogenous H2 mixture models due to their condition numbers and adequate model control graphs. For this reason, the investigator can choose the best model among the models with one or two terms. For example, the model control graph of the model with terms ( ) Fig. 1.
The mixture surface for 4 0.03 x = and 4 0.08 x = on the experimental region for the model is shown respectively in Fig. 2.

CONCLUSION
In this study, comparisons of the results, which were obtained by different subset selection methods for the flare data set, were done. The models obtained by backward elimination, forward selection and stepwise regression methods are one of the models obtained by all possible subset selection. By using all possible subset selection, we can obtain models according to criteria we want. On the other hand, due to effect by collinearity in the models with the highest 2 R value obtained by "RSQUARE procedure", extra criteria were taken into account in choosing the models that can be used in interpreting the mixture system. These criteria are the comparison of the condition numbers and the investigation of the model control graphs of the models with small condition numbers. As a result of comparing the condition numbers of the models, models with more consistent parameter values were obtained. Therefore the condition number of the each model should be taken into account when the all possible subset regression models for the determination of the mixture model are investigated.
As an addition, linear mixture terms of the model were kept in the model to prove the hierarchy principle. Hierarchy principle in mixture experiments is essential for the models, obtained by pseudo-components and actual components, to be equivalent. Expressing the components in terms of pseudo-components will also alleviate the problems due to the correlations among the coefficients. However, this property due to the structure of the models including inverse term and that of homogenous H2 mixture models has to be investigated for both the pseudo-components and the actual components. In addition, in the presence of collinearity, ridge trace can be used as an alternative approach in choosing the model for mixture experiments.