Comparisons of Test Statistics for Noninferiority Test for the Difference between Two Independent Binominal Proportions

Problem statement: Noninferiority tests are frequently used in clinic al trials to demonstrate that the response for study drugs is not much worse than the response for reference drugs. Several tes t statistics exist. However, a detailed comparison of th se test statistics is not researched. Moreover, a little complex calculation might be necessary in so me of those test statistics. Approach: In this study, we investigated the performance of the existing tes t statistics and propose new test statistics. Furth er, we compare them with existing test methods by means of simulation and devise a suitable technique of using of these test statistics. Results: We found that for the proposed test statistics, th e actual type I error was close to the nominal level. Further, when the sample size is moderate it is found that, the new test statistics have a little higher power than oth er test statistics. Conclusion: One of the biggest advantages of our method is that it does not requir complicated calculations.


INTRODUCTION
A noninferiority test, whose main purpose is to indicate whether the response for study drugs shows clinically not much worse than the response for reference drugs, is often conducted in clinical trials. A noninferiority test is especially, employed to derive the difference between two binomial proportions if the response is an independent binominal. The ICH-E9 guidelines and the European medicines agency guidelines showed the framework of noninferior setting comparisons between treatment groups.
Research pertaining to noninferiority tests for deriving the differences between proportions has been conducted since a long time. However, few theses consider the behavior of test statistics in detail. Moreover, research in this field has been initiated only recently. Dunnett and Gent (1977) selected an example of noninferiority test from a clinical trial. In their research, an estimator weighted by a noninferiority margin was used for the unknown parameter with test statistics. Farrington and Manning (1990) proposed three methods for estimating for an unknown parameter in standard error measurement and they recommended using a restricted maximum likelihood estimator, which is a restricted value of the null hypothesis, proposed by Miettinen and Nurminen (1985). The statistical analysis software-power analysis and sample size-can calculate power in eight ways. Almendra-Arao (2009) showed that non-inferiority test sizes are calculated for the difference between two independent proportions based on Z-statistic with pooled variance, for several continuity corrections and the behavior of these test sizes is analyzed. Hirotsu et al. (1997) provides confidence intervals that correct skewness and discusses the design issue of the required sample size for the noninferiority test. Dann and Koch (2008) proposed a method of evaluating the noninferiority test on the basis of some confidence intervals. They also showed the relationship between the confidence intervals and the noninferiority test for the difference between two independent binominal proportions. Zhang et al. (2006) proposed a new test statistic for the noninferiority test for ordered categorical data and they expanded their test statistic to the difference of proportions. In this study, we propose a new test statistic, distinct from the method proposed by Zhang et al. (2006).
We present a method of deriving an estimator, focusing on the noninferiority test for the difference between two independent binominal proportions and we detect and verify a well-performing estimator in this study.

MATERIALS AND METHODS
Suppose that X 1 and X 2 are two independent random variables with a binomial distribution. The first random variable is size n 1 and it has a binomial proportion π 1 , denoted as X 2 ∼B (n 1 , π 2 ). The second random variable is size n 2 and it has a binomial proportion π 2 , denoted as X 2 ∼B (n 2 , π 2 ). In this study, we assume that a large binominal proportion is preferred consistently. Here, the hypothesis of the noninferiority test for deriving the difference between proportions is: where, the noninferiority margin is ∆ 0 >0. We assume that δ = π 2 −π 2 . The difference between sample proportion, 1 2ˆδ = π − π , is the estimator for δ, where 1 1 1 X / n π = and 2 2 2 X / m π = . Therefore, the expected value under the null hypothesis is: The variance is: Therefore, the statistic of standardized δ is given by: This Z-test statistic asymptotically has a standard normal distribution. However, several test statistics have been proposed since the unknown parameter involved in Z-test statistics.
Pooled variance: The variance of the estimator under the null hypothesis in a significance test is: where, the unknown parameter is π = π = π 2 . This variance is generally known as pooled variance. By replacing the unknown parameter in this variance with the estimatorπ , the Z p test statistic is given by: where, the estimator for π is (1 ) n n π − π + ∆ = π − π π − π + This is known as the Wald test statistic. Many researchers have indicated in many study that the performance of the Wald statistic suffers when the sample size is small. Further, Munzel and Hsuschke (2003) showed the framework of the noninferiority test for ordered categorical data. When the number of categories is assumed to be two, it is regarded as a problem with regard to the difference between proportions. Hence, this test statistic is derived by extending the method proposed by Munzel and Hsuschke (2003) to the noninferiority test for deriving the difference between proportions.
Null hypothesis variance 1: The variance of the noninferiority test under the null hypothesis is: (1 ) V( ) n n π − ∆ − π + ∆ π − π δ = + Dunnett and Gent (1977) proposed the estimator: 1 2 1 0 2 1 2 X X n n n for the unknown parameter π 2 . By using this estimator, the Z D test statistic is shown as: This is called the Dunnett-Gent test statistic. We suggest that the problem was that the estimator (3) exceeded the limit value 1.
Null hypothesis variance 2: Miettinen and Nurminen (1985) constructed a maximum likelihood estimator with a restriction for the binominal proportion π 2 under the null hypothesis. Farrington and Manning (1990) proposed a test statistic using this estimator. The loglikelihood function under the restricted null hypothesis π 1− π 2 = −∆ 0 is: The solution π 2 , which maximizes this function is given by solving the following cubic equation: Therefore, the maximum likelihood estimator is: Using this restricted maximum likelihood estimator, the Z F test statistic can be shown as: Null hypothesis variance 3: Zhang et al. (2006) proposed a new test statistic for noninferiority test in ordered categorical data. They extended it to derive the difference between proportions and introduce the Z C test statistic as: 1 10 C 2 2 10 01 10 10 2 1 2 00 Using each maximum likelihood estimator for the unknown parameter in the Z C test statistic, the Z CE statistic is defined by: Kawasaki et al. (2008) applied this test statistic to the confidence interval for the difference between two independent binominal proportions. They showed that the new confidence interval showed a greater improvement in performance than the Wald interval.

Null hypothesis variance 4:
In the test statistic used by Zhang et al. (2006), the estimator for the unknown parameter in variance is not unbiased. In this study, we use these unbiased estimators for the unknown parameter to propose a new test statistic that is defined as: where, the unbiased estimators are: 2 2 1 2 1 2 1 2 2 1 1 1 2 2 00 1 2ˆˆˆˆ( n n n n 2)(1 ( ) ) (n 2) (1 ) (n 2) (1 ) 4(n 1)(n 1) The derivation for these unbiased estimators is illustrated in the Discussion.

RESULT
We show the validity and usability of each test statistic. In this research, with regard to the validity of the test, it is assumed that the type I error is close to the nominal level. Further, usability of the test is assumed to be high power.
In Table 1, we evaluate whether the actual type I error is at the nominal level of 2.5%. In Table 2, we show that the actual type I error is at the nominal level of 5%. The actual type I errors of each method are calculated by conducting a simulation 100,000 times under each condition. The following points are indicated in Table 1 and 2.    (1)    The actual type I error of Z W exceeded the nominal level with a small sample size and even when the sample size was moderate, it often exceeded the nominal level. The actual type I errors of Z CE , Z D and Z P showed similar behaviors. Besides, the actual type I errors of these methods are close to the nominal level, except in cases where the small sample sizes are small. We found that the actual type I errors of Z F and Z CU came close to the nominal level even though the sample size was small. Further, when the population proportion was an extreme value, the actual type I error of only Z F was close to the nominal level. Therefore, we recommend the use of Z F test statistics in cases where the population proportion is assumed to be extreme. Thus, all of the above indicate that Z F and Z CU test statistics have high validity. In Table 3, we showed the actual power in the one-sided test at the nominal level of 2.5%. The actual power of each method is calculated by a simulation conducted 100,000 times under each condition, as we did for the type I error.
Further, we found that Z F and Z P had lower power; in particular, Z F had lower power even at large sample size. We also found that the characters of the power of each statistic were changed by the value of the noninferiority margin only in a few cases. From the above result, it was inferred that Z D , Z CE and Z CU test statistics have high usability.

DISCUSSION
The derivation for these unbiased estimators is illustrated in this section. Let us consider a nonparametric two-sample situation, where it is assumed that the variables Y 11 , Y 12 ,…, Y 1n1 ∼Y 1 and Y 21 , Y 22 ,…,Y 2n2 ∼Y 2 are mutually independent. For the purpose of formulating a nonparametric test, a pivotal probability is advocated by some authors. The nonparametric test for noninferiority may be formulated as: The nonparametric test for noninferiority may be formulated as: where, δ 0 is the noninferiority margin and δ 0 <0. Let ϕ be a function of two real variables: is an asymptotically normal distribution. However, we cannot use this test statistic. We should replace the unknown parameter in the Z-test statistics by estimators. Munzel and Hsuschke (2003) proposed that the test statistics for hypothesis (7) Moreover the empirical estimators of P 2 and P 3 are: . 2 2 2 i 3 2 2 j i 1 j 1 1 2 1 2 n n 1 2 1 1p U ., p U n n n n = = = = ∑ ∑ Zhang et al. (2006) pointed out that one problem with this is that it used the variance under an alternative hypothesis. They proposed the test statistic: ( ) 1 10 C 2 2 10 01 10 10 2 00 and used a variance under a null hypothesis. This test statistic Z C follows the asymptotic standard normal distribution. However, we cannot use it as it is. They used expressions (8) and (9) where, 2 00 1 1p (1 p ) σ = − . Zhang et al. (2006) call the Z CE test statistic an empirical test statistic. We derive unbiased estimators for the unknown parameter with Z C test statistics. The unbiased estimators of P 2 and P 3 are given by: σ ɶ and 2 01 σ ɶ are unbiased and consistent. Therefore, the Z CU test statistic is proposed as: ( ) 1 10 CU 2 2 10 01 10 10 2 00 We let Y 1 and Y 2 be two independent Bernoulli random variables with π 1 and π 2 respectively. Through simple calculation, we obtain: Therefore the estimator of P 1 is given by: The hypothesis for noninferiority, expression (7), can be represented as: 4 4 σ = π − π σ = π − π σ = − π − π Therefore, the Z CE test statistic is: ( ) We can obtain other expressions of the relationship between the empirical estimator and unbiased estimator for p 2 and P 3 as: Substituting (13) and (14)

CONCLUSION
In this study, we investigated the validity and usability of test statistics in the noninferiority test for the difference between two independent binominal proportions.
It was deduced that the power of the Z P test statistic is generally low. We suppose that this is a result of the use of the variance with an assumed null hypothesis for a significance test.
We found that the Z W test statistic showed higher power than the Z P test statistic. However, it also showed that the actual level frequently exceeded the nominal level. Therefore, the Z W test statistic does not fulfill the validity of testing. Hence, using this method only because its power is high might lead to a wrong conclusion.
The power of the Z D test statistic performed better. However, it is best if this test statistic is used judiciously since the estimator of a nuisance parameter used in this test statistic may exceed the limit value.
We have deduced that the Z F test statistic is the method that passes the validity in the noninferiority test. Especially, we also found that this is also the only method in which the type I error comes close to the nominal level when the population proportion is an extreme value. However, we also found that the power of this method is comparatively low. Moreover, the method of calculating this test statistic is a little complicated since this method uses a restricted maximum likelihood estimator.
In conclusion, we prove that the proposed Z CE and Z CU test statistics are methods that show that their type I errors are comparatively closer to the nominal level and also that they have reasonably higher powers; This is particularly true in the case of the Z CU test statistic, which uses an unbiased estimator that shows a stable positive behavior in the hypothesis test. In addition, one of the biggest advantages of our method is that it does not require complicated calculations.