Some Test Statistics for Testing the Binomial Parameter: Empirical Power Comparison

Problem statement: The Binomial distribution is one of the most useful probability distributions in the filed of quality control, physical and medical sceinces. Many questions of interest to the health worker related to make inference about the unknown population proportion, parameter of binomial distribution. This study considers the problem of hypotheses testing of the parameter of a binomial distribution. Approach: Different test statistics available in literature are reviewed and compared based on the empirical size and power properties. Since a theoretical comparison is not possible, a simulation study has been conducted to compare the performance of the test statistics. To illustrate the findings of the paper, two real life health related data are analyzed. Results: The simulation study suggests that some methods have better size and power properties than the other test statistics. The performnace of the proposed test statistics also depend on the hypothesized value of the binomial parameter. Conclusions/Recommendations: The practitioners should be careful about the hypothesized value of the binomial parameter p. If the hypothesized value is near 0.5, any test is acceptable for moderate to large sample size. However, for testing the end or small value of p, one might need very large sample size to have a good power and actual size of the test.


INTRODUCTION
The Binomial distribution is one of the most widely encountered probability distributions in applied statistics. Many question of interest to the health worker relate to make inference about the unknown population proportion, parameter of binomial distribution. For example, one may be interested in the proportion of recovery out of patients receiving a particular treatment or percentage of subjects with Impaired fasting Glucose in a population of interest (Wagenknecht et al., 2003). To make inference about the unknown parameter of the binomial dis-tribution, one may consider either confidence interval or hypothesis testing. Several authors in several times have discussed the confidence interval estimates. Among them, Agresti and Coull (1998), Anscombe (1948), Barker (2002), Bartlett (1947), Cai (2005), Casella (1986), Efron (1987), Flowerdew and Aitkin ( 2006), Freeman and Tukey 1950), Garwood (1936), Wilson (1927) are notable. However, the lit-erature on the test statistics for testing the proportion of success of a binomial distribution is limited. This study made an attempt to consider various available test statistics, namely, exact Clopper Pearson Method, Bayesian Method, Wilson method without continuity, Wilson method with continuity correction, Wald method without continuity, Wald method with continuity, recentered Wald without continuity, recentered Wald with continuity, Bootstrap 1method without continuity, Bootstrap method with continuity correction, arcsine (variance stabilizing) transformation, arcsine (variance stabilizing) transformation with a continuity correction for testing the binomial proportion and compare them under the same simulation conditions. Therefore, the important contribution of this study is to compare several test statistics proposed by several researchers in several times under the same simulation condition and to find some good test statistics based on size and power of the test. Since a theoretical comparison is not possible, a simulation study has been made to compare performances of the proposed test statistics.

METERIALS AND METHODS
Suppose X 1 , X 2 , X n be a iid random sample from a binomial population with parameters n and p. Consider the following hypothesis: Null hypothesis: H 0 : p = p 0 Alternative hypothesis: H a : p = p 0 ±c Where c∈ (0, 1) is a positive constant. Here we are interested for a two tailed test. However, one may easily follow the same procedure for left or right tailed tests. When c = 0, we get size of the test (type I error rate (α)). When c≠0, we get powers (1-β) of the test statistic. Our objective is to test against a proposed value of the parameter p with a specific significance level. We have considered 14 different test statistics for testing the binomial parameter p. Since the references for all proposed test statistics are available, we briefly discussed them here.

RESULTS
The main objective of this study is to find some good test statistics for testing the parameter of a binomial distribution. Since a theoretical comparison is not possible, a simulation study has been made to compare the size and power performances of the test statistics.
In each case, 5000 random samples are generated. The most common 5% level of significance (a = 0.05) is used to compute the empirical power. We compare the performance of the test statistics based on empirical sizes and powers, which is calculated as the fraction of the rejections of the null hypothesis out of 5000 simulation replications. Empirical size and power of the test are calculated based on the following hypothesis, H a : p 0 ±c. We get size of the test a when c = 0, otherwise powers of the considered test statistics. For a = 0.05, the simulation results are presented in Table 2-6.

DISCUSSION
From Table 2-6 we observed a general pattern is that as the sample size increase the power of the test also increase and the nominal size approach to 0.05. Also power increase as the value of p departed from the hypothesized value p 0 . We observed that for large n, the performance of the test statistics do not differ greatly in the sense of power and attaining nominal size of the test. However, a significant difference observed for both small sample size and small p. Overall, based on the empirical power and size of the test, we may conclude that group 1 (Method 6, 8, 10, 12) performed the best followed by group 2(Method 1, 4, 5, 13), group 3 (Method 2, 3, 7, 11 14) and Method 9 performed the worse.

Applications:
Example 1: Wagenknecht et al. (2003) collected data on a sample of 301 from Hispanic women living in San Antonio, Texas. One variable of interest was the percentage of subjects with Impaired Fasting Glucose (IFG). IFG refers to a metabolic stage intermediate between normal glucose homeostasis and diabetes. In the study 24 women were classified in the IFG stage. The article cites population estimates for IFG among Hispanic women as 6.3 percent. We would like to test whether there is sufficient evidence to indicate that the population of Hispanic women in San Antonio has a prevalence of IFG different from 6.3%. The hypothesis H 0 : p = 0.063 against H 1 : p ≠ 0.063. From the sample we havep 24 / 301 0.080 The results of the test from all the methods discussed here are given in Table 7. All methods leads to the same conclusion of Do not reject H0', at the significance level 0.05. However, we might consider those tests have good power and nominal size of the tests.
Example 2: Becker et al. (2003) conducted a study using a sample of 50 ethnic Fijian women. The women completed a self report questionnaire on dieting and attitudes toward body shape and change.            The researchers found that five of the respondents reported at least weekly episodes of binge eating during the previous 6 months. We would like to test the proportion of Fijian women engaged in at least weekly episodes of binge eating is different from 0.20. From the data we have p 5 / 50 0.10 = = .
The results of the test from all the methods discussed here are given in Table 8. As in the previous example, all methods leads to the same conclusion of do not reject H 0 , at the significance level 0.05. However, we might consider tests those have high power and size of test is close to nominal size. It should be noted that methods which showed significance level larger than specified α = 0.05 in simulation have shorter length in the examples. One should be careful in making decision based on these methods that have shorter length because of higher chance of Type I error.

CONCLUSION
In this study we considered various available test statistics for testing the proportion of a binomial distribution. Since a theoretical comparison is not possible, a simulation study has been made to compare the performance of the test statistics in the sense of attaining nominal size and power of the test. Our simulation indicate that among proposed test statistics, Method 6, Method 8, Method 10 and Method 12 performed better compared to the rest. However, Methods 1, 4, 6, 8, 10, 12 and 13 are promising and can be recommended to the practitioners. However, the practitioners should be careful about the hypothesized value of the binomial parameter p. If the hypothesized value is near 0.5, any test is acceptable for moderate to large sample size. However, for testing the end or small value of p, one might need very large sample size to have a good power and actual size of the test. Two real life data are analyzed to illustrate the proposed test statistics of the study.

ACKNOWLEDGMENT
Authors are thankful to the editor for his valuable comments which certainly improved the qulaity and presentation of the paper. This study was written while the second author was on sabbatical leave (2010)(2011). He is grateful to Florida International University for awarding him the sabbatical leave which, gave him excellent research facilities.