A Distribution-Free Bayesian Approach for Indirect Comparisons

E-mail: jack_tubbs@baylor.edu Abstract: The objective of this paper is to consider an indirect comparison between treatments B and C when each have been compared directly to treatment A in separate studies. Problems of this type are common in Network Meta-Analysis (NMA). A commonly used method assumes that the underlying data, Y, are normally distributed and μ = E(Y) is the measure of clinical effectiveness. The normal assumption is often violated. In addition, the sample sizes are not necessarily large. These conditions challenge the concept that a single location parameter, such as, the mean or median should be used as the measure of clinical effectiveness in the analysis. In this paper, we present an alternative approach where the Area Under the ROC Curve (AUC) is used as the measure of clinical effectiveness. Since the normal distribution may be uncertain, we use a distributionfree Bayesian mixtures of Finite Polya Trees (MFPT) model with the AUC in order to make the indirect comparison.


Introduction
In a standard Meta-Analysis (MA), aggregate effects from different studies of a single treatment or the comparison between a single treatment and a common control are combined using a statistical model. The basic elements of the meta-analysis model include two parts, a model for the within study variability and a model for the between-study variability. In a fixed effects analysis, the between-study variability is assumed to be 0. A very commonly used MA is based on a normal-normal model for the two components (Viechtbauer, 2010). Models of this type allow one to estimate a pooled effect by combining the results from the individual studies.
Suppose the objective is to compare different treatments from several studies that are linked through common comparisons, then a Network Meta-Analysis (NMA) is used instead of a meta-analysis. In a NMA, indirect comparisons between treatments are possible whenever two or more studies compare different treatment effects with a common control. Li et al. (2018) presents a NMA for advanced or metastatic non-small cell lung cancer where each of the studies involve various therapeutics when compared with a common control, similar dosage of Docetaxel. In this paper, we use two studies, KEYNOTE-010 and CHECKMATE 057, from the larger NMA as illustrated in Fig. 1 as a motivating example where the objective is to make an indirect comparison between treatments Nivolumab and Pembrolizumab, when each have been compared directly to Docetaxel. An indirect comparison of treatments Nivolumab and Pembrolizumab is possible if the clinical measure of effectiveness, Y, is used for each direct comparison and if the assumption of consistency holds. It should be noted that the studies considered in Li et al. (2018) had the Overall Survival rate (OS) as a primary endpoint. A more common NMA assumes that µ = E [Y] is the measure of clinical effectiveness where the aggregate data for the study, Y , have a normal distribution.
In a frequentist analysis, a common practice is to estimate the combined effect between all the studies using inverse weighting of the mean of the study effects (Sutton et al., 2000;Der Simonian and Laird, 2015). In a Bayesian analysis, a typical approach uses a hierarchical model with a normal distribution specified on the study effects (Sutton and Abrams, 2001). When the above assumption holds and E[Y] is used as the measure of clinical effectiveness, then an indirect comparison of B and C is consistent since: Suppose the normal assumption does not hold, then the parameter µ = E (Y) may not be an appropriate measure of clinical effectiveness. This situation often happens when the density functions for the continuous random variable Y in the NMA are skewed or multi-modal.
The purpose of this paper is to investigate an alternative method for making the indirect comparison when the normal assumption does not hold. We propose two modifications to the traditional NMA for making an indirect comparison between treatments B and C with continuous data. The first modification involves the use the Area Under the receiver operating Curve (AUC) as the measure of clinical effectiveness when making a direct comparison between treatments A and B and A and C. The AUC AB is chosen since the AUC AB = Pr [Y A > Y B ] does not depend upon the distribution of Y yet, involves the entire distribution of Y rather than a single location parameter, such as the mean or median. The second modification involves the use a distribution-free Bayesian approach when constructing the hierarchical model. A Mixture of a Finite Polya Tree (MFPT) model is used.
The outline for the paper is as follows. A description of the MFPT model is given in section 2. Section 3 provides a brief summary of the ROC and AUC for treatments A and B. The special cases when the response variable Y are normally distributed and the distributionfree MFPT estimates of the ROC and AUC are also presented in Section 3. The network models for an indirect comparison of treatments B and C are given in Section 4. A simulation study comparing the performance of the proposed method with the traditional normal model is presented in Section 5. The proposed method is illustrated using data from two Diabetic Macular Edema (DME) studies in Section 6. A summary and discussion is given in section 7.

Mixtures of Finite Polya Trees
A flexible class of methods given in Bayesian nonparametric literature have been proposed for describing distributions that exhibit features such as multi-modality and skewness. One such model is a mixture of finite Polya Trees. As in any Bayesian approach, the posterior density function is proportional to the product of the likelihood and a joint prior distribution. These components for the Polya tree model are presented in the next two sections. Christensen et al. (2008) illustrate mixtures of finite Polya trees that are centered on a N(µ,σ 2 ) family of distributions. Flexibility from the normal distribution family is achieved by reweighting sections of the distribution through stagewise partitioning of the support. At each successive partition, the conditional probability of being above and below the split is altered through the introduction of a new set of parameters. The procedure given in Christensen et al. (2008) enables one to express the density of Y at the M th stage as:

Prior Distributions
The parameters of the Polya tree model are divided into two sections. The first, denoted by Ψ, consists of the indexing parameters for the base distribution. When Y ∼ N (µ, σ 2 ), then Ψ consists of µ and σ 2 . The second section consists of the bin probabilities θ ms ∈ Θ for m = 1,..., M and s = 1,...,2 m where Θ contains 2 M+1 −2 parameters. The flexibility of the model is achieved through the choice of the bin probabilities.
Suppose that one has expert opinion that one wishes to incorporate into the model. For example, suppose at the first stage that one has expert knowledge concerning Pr [Y ≤ µ]. At the second stage, one can specify Pr [Y ≤ q 1 |Y ≤ µ] and Pr [Y > q 1 |Y ≤ µ] where q 1 is the first quantile for Y. This process of eliciting information can be performed for each of the M stages.
If expert opinion is not available then an alternative approach uses the beta distribution as a reference prior. That is, let:

( )
,2 ,2 ,2 1 , m s m s m s Beta θ α α + ∼ and θ m , 2s+1 = 1 -θ m,2s for m = 1,..., M and s = 1,3,5,..., 2 m −1 where the pairs of bin probabilities are assumed to be independent. In which case, one exchanges specifying priors on Θ with the hyperparameters denoted by α. If the prior information is unknown, a reference prior in the form of α ms = cρ(m) is commonly used where c reflects the strength of the prior belief in the original parametric family and ρ(·) is a nondecreasing function. In which case, prior specification on each α ms is replaced by specifying a single prior on c. This formulation ensures that the beta prior is symmetric with decreasing variability as the tree level m increases. In which case, the conditional bin probabilities at lower tree levels are less likely to deviate from the base distribution than higher tree levels.
Fixed priors specified on the parameters of the base distribution result in a finite Polya tree. A Finite Polya Tree (FPT) with M levels, reference prior cρ(·), parametric centering family * F Ψ and priors on the θ ms is denoted as: ( , , ).
The posterior density using a finite Polya tree model is potentially discontinuous at the boundaries of the bins. A solution to this problem is to specify a prior distribution on the parameters of the base distribution (Christensen et al. (2008)). That is, by introducing a prior on Ψ = (µ,σ 2 ) when Y ∼ N(µ,σ 2 ), the median, quartiles and all subsequent splits are no longer fixed but are random. A Mixture of Finite Polya Trees (MFPT) is given by: which is obtained after generating a set of {θ ms } from the prior distribution on {θ ms } and averaging over the prior on the support Ψ. The MFPT model for the distribution, F, is denoted as: There are situations when E[Y] = µ may not be an appropriate measure of clinical effectiveness, in which case, we propose using the AUC. In the next section, a non-parametric estimate of the AUC is presented when the model described in Equation 5 holds.

ROC and AUC
The Receiver Operating Curve (ROC) and area under the ROC (AUC) can be used to compare two distributions. In the MFPT model, the relationship between the ROC and survival functions for each treatment are of special interest as given by: where, S g (·) = 1 − F g (·) is the survival function for treatment g, g ∈{A,B} and t ∈ [0,1] (Pepe et al., 2009). The AUC is: Bayesian non-parametric models in ROC analysis can be found in Branscum et al. (2008; and Hanson, (2006).

ROC and AUC for the Binormal Model
In the special case when , the ROC and AUC have a closed form expression (Dorfman and Alf (1968) and Faraggi and Reiser (2002)) given by: where Φ is CDF for the standard normal distribution function: , .

ROC and AUC for the MFPT Model
The MFPT model provides distribution-free estimates for the ROC by replacing F g and 1 g F − in Equation 6 with the posterior distribution function and inverse of the posterior distribution function. The posterior distribution function, F g , for treatment g is: where, k(y) indicates the bin at level M that contains observation y from group g, * F Ψ (y) is the base distribution function and: is the unconditional posterior probability of bin i at level M for i = 1,..., k(y) (Hanson, 2006). Θ g (i) is the set of probabilities, θ gms , for group g, m = 1,..., M and where s indicates the bin at the m th level that contains the i th bin at level M.
The inverse posterior distribution function, where, Q is such that Hanson, 2006). The posterior density function is given by: Numerical integration of the ROC is used to provide an estimate for the AUC.

Network Models with Random Effects
The two primary methods used to combine information from studies are fixed and random effects models (Stangl and Berry, 2000). A fixed effects model is used when the studies are homogeneous, in which case the studies do not differ in terms of the underlying study population, patient selection criteria and application of therapies. The fixed effects model is highly restrictive, in that, it assumes that patients are exchangeable between studies. A random effects model is appropriate under a less rigid assumption of homogeneity, where each study effect is modeled as a random effect from a common distribution of study effects.
A random effects model is used in the scenario described in Fig. 1 where the direct comparisons with effect A are estimated from two studies, one with B and one with C.

Bayesian Normal Network Model
In this section, the commonly used Bayesian parametric normal model is described where one assumes that the observations for the treatment and control groups are random samples from normal populations. That is, let: for subject i = 1,..., N jk , study j = 1,..., J k and treatment k∈{A,B,C} where J A = 2 and J B = J C = 1. The mean, µ jk , in a random effects model is: where, β jA is the random effect for the control group and β k is the treatment effect for k∈{B,C}. The random effects model assumes that the control group effect, β jA , is randomly selected from a normal population of effects with mean m A and standard deviation, s A . Independent uniform priors, σ jk ∼ U(0, B σk ), are specified on the standard deviations in Equation 11 and independent normal priors on the regression coefficients, β A and β k , in Equation 12. A hyperprior, , is specified on the mean of the random effects on A and a uniform hyperprior, , on the standard deviation on A. The Bayesian normal random effects model can be written as: with priors on the model parameters: and hyperpriors on the parameters of the random effects:

MFPT Network Model
In this section, the MFPT model is specified when the effect on the control group, A, is assumed to be random.
That is, let: and treatment effects for groups, k∈{B,C}: Independent normal priors are specified on the means, µ jA ∼N(m A , s A ) and µ k ∼N(m k , s k ) and independent uniform priors on the standard deviations,  The Bayesian MFPT random effects model on A is written as: with priors on the model parameters: and hyperpriors on the random effects parameters: The model on the k th treatment effect is written as: with prior structure:

Simulation Study
A simulation is performed to evaluate the performance of the MFPT model using the AUC as a measure of clinical effectiveness. The type I error in testing the indirect comparison of treatments B and C will be of particular interest.
The null hypothesis is: when the AUC is the measure of effectiveness in the MFPT model. The ratio of AUCs is chosen instead of a difference in AUCs for the sake of interpretability. For example, if R CB = 1.2 we can interpret this as P(C>A) is 1.2 times larger than P(B>A). The simulations involved two data types, normal and Gumbel (extreme value). The pdf for the Gumbel distribution is: for v∈R, the mode, µ and scale parameter, λ. When the data are normal, the indirect comparison of B and C using the MFPT model is compared with the traditional binormal model given in Section 4.1 using µ as a measure of clinical effectiveness and the null hypothesis given by: Table 1 summarizes the design parameters and true values for the AUC. The design parameters for the three designs using the extreme value data are summarized in Table 2. The performance of the MFPT model is evaluated using both data types.
Rejection of the null hypotheses in Equation 30 and 28 are based on the (1-α) 100% equal-tailed credible set from the Markov Chain Monte Carlo simulation to control the type I error at α = 0.05. For design D1 and D3, the null hypothesis of equality of B and C should not be rejected; whereas in design D2 the null hypothesis should be rejected. The percentage of times the null hypothesis are rejected for the normal and extreme value simulations using N = 250 iterations are reported in Table 3 and 4.

Results with Normal Data
In this section, the comparison of B and C through the null hypothesis in Equation 30 is appropriate using the estimates from the binormal random effects model. In D1 where the null hypothesis should not be rejected more than 5% of the times, the rejection rate of Equation 30 is 5% using the binormal model. Using the ratio of AUC's in Equation 28 has the same rejection rate as that found in the binormal model. The MFPT model performs similarly with a rejection rate of 6% and 4% testing Equation 28 and 30, respectively. In D2 where the null hypothesis is false, the null hypotheses in Equation 28 and 30 are correctly rejected 100% of the times using the binormal and MFPT models. In D3, the rejection rate of Equation 30 and 28 is 3% and 4%, respectively, using the binormal model. The MFPT model has a rejection rate of 3% and 5% testing Equation 30 and 28, respectively. The type I error is controlled close to the nominal rate using each measure of clinical effectiveness with the binormal and MFPT models.

Results with Extreme Value Data
In this section, it is no longer appropriate to use the binormal model or difference in means to compare B and C. Instead, the results for the rejection rate of the null hypothesis with the MFPT model using a ratio of AUC's is reported in Table 4. In D1, the rejection rate of Equation 28 is 1%. The null hypothesis is correctly rejected 100% of the times in D2. In D3, where the null hypothesis should not be rejected, the rejection rate is 2% using the ratio of AUC's from the MFPT model. The random effects MFPT model using the ratio of AUC's is slightly conservative testing Equation 28.
The results presented here are very representative of the results found when doing a more exhaustive simulation study.
In the next section, we illustrate the methods given above with data from two published clinical trial with diabetic patients with macular edema.

Example with Diabetic Macular Edema
Diabetic Macular Edema (DME) is a form of diabetic retinopathy that impairs the central vision. The current standard treatment is focal/grid photocoagulation (laser therapy) from which vision stabilization is an optimistic outcome where a primary endpoint is the change from baseline in the visual acuity score and a secondary endpoint is the change from baseline in central retinal thickness (CRT). Two randomized controlled trials, RESTORE (NCT00687804) and VIVID (NCT01331681), compared the efficacy of Ranibizumab (R) and Intravitreal Aflibercept (IAI) to laser therapy (control). A description of RESTORE and VIVID and results for the secondary endpoint, change from baseline in CRT, are given in Sections 6.1 and 6.2. A comparison of the efficacy of R and IAI is of interest. However, there are no available trials that directly compare the two therapeutics. The random effects MFPT model for an indirect comparison between R and IAI is given in Section 6.3.

RESTORE Trial
Novartis, (2013) performed a randomized, doubleblind, multi-center, phase 3 trial with a laser control arm and two treatment arms. The laser photocoagulation treatment was administered at the commencement of treatment. If necessary, additional treatments were administered at intervals of at least 3 months. Ranibizumab 0.5 mg was administered monthly by intravitreal injection in the study eye for 3 months. The treatment was suspended following the third injection if one of the specified criteria reported by Novartis, (2013) was met. The study results for the change from baseline at month 12 in central retinal thickness of the study eye is given in Table 5.

VIVID Trial
A double-blind, randomized, phase 3 study of the efficacy and safety of IAI administration of VEGF TRap-Eye with a laser control arm in patients with DME was completed by Bayer and Regeneron Pharmaceuticals (2016). The laser control arm was administered at baseline, with additional treatments at intervals of at least 3 months when necessary. Patients in the IAI 2Q4 group received 2mg Intravitreal aflibercept injection every 4 weeks. The study results for the change from baseline at month 12 in CRT of the study eye is given in Table 6.

Indirect Comparison of RESTORE and VIVID Trials
The summary statistics from RESTORE and VIVID trials were used to simulate the data presented in this section. The treatment arms of interest are Ranibizumab 0.5mg from the RESTORE trial and IAI 2Q4 from the VIVID trial. Data are adjusted so that a larger response correspond to a more effective treatment. Summary statistics for the simulated data are given in. Table 7 note that the data are skewed with the median less than the mean. Figure 2 illustrates the sample densities of the four treatment arms. The objective of the study is to make an indirect comparison between Ranibizumab 0.5mg and IAI 2Q4. The random effects MFPT model used to make an indirect comparison between B and C has two components. The model for the random effects on A and the model for the treatment effects.
and the model on the treatment effects as:  (Novartis, 2011;Bayer and Regeneron Pharmaceuticals, 2016) A sensitivity analysis for the specification of the upper bound on the uniform prior on s A and σ k in Equation 36 and 41 is illustrated in Fig. 3. The MFPT model for the network of treatments is fitted with a range of upper bounds, B, on the uniform priors. Figure 3 presents the 95% credible set on R CB , the ratio of AUCs. This example is an interesting scenario where the choice of maxB potentially influences the inference illustrating the importance of conducting a sensitivity analysis on the prior selection. The width of the credible sets on the ratio of AUCs conservatively has stabilized beyond maxB = 230 indicating that the choice of a U(0, 250) prior does not influence the posterior results in terms of the variability around the estimate of the ratio of AUCs.  Figure 4 presents satisfactory convergence diagnostics for R CB . The posterior density plot is centered over the true ratio of AUCs as indicated by an asterisk. The empirical ratio of AUCs is represented by the triangle. The posterior probability that R CB is greater than 1 is 97.3%. Recall that having R CB > 1 indicates that the numerator effect (IAI) is more effective then the denominator effect (R) when compared with the laser control.

Discussion
This paper presented a method for making indirect comparisons between treatments when the commonly used normal method for the NMA framework was inappropriate. The method made use of a distribution-free Bayesian MFPT to develop a random effects model when the AUC is the measure of clinical effectiveness. A simulation study is performed to evaluate the performance of the MFPT model using the AUC as a measure of clinical effectiveness. The type I errors for the proposed method is comparable to those found using the binormal model when the data are simulated from a normal distribution. The proposed method performed well when the data were simulated from an extreme value distribution. In both simulations the method was able to control the type I error when testing the indirect comparison of the two treatments.
There is one caveat to the ratio of AUCs. For the case where there is a complete separation between the treatment and control group, the AUC is 1 regardless of the degree of separation between the control and treatment groups. If this is the case for both studies, the ratio of AUCs will not work since the ratio equals 1 and a conclusion of no significant difference is made when it is possible that there might be a difference between the two groups. Since this case is so extreme, one suspects that a simple visual inspection of the respective histograms or density functions will keep one from making this error.
The proposed method was demonstrated using two clinical trials for treatment of Diabetic Macular Edema (DME). The two trials, RESTORE and VIVID, separately investigated the safety and efficacy of Ranibizumab and Aflibercept as compared to a laser control. The example illustrated the capabilities of the proposed method of making an indirect comparison between two treatments that have been compared separately to common control. Since, we used the summary statistics from the two studies to generate simulated data we were not overly concerned with the outcome of this exercise. It should be noted, that the real value of the procedure presented in this paper might be in the in-house investigations that pharmas undertake when trying to access theirs (and other's) drug pipeline.
In conclusion, the MFPT model provides a reasonable alternative for an indirect comparison within a network meta-analysis when the normality assumption for the measure of clinical effectiveness might not hold. The use of the ratio of AUCs has the added benefit of providing a stochastic interpretation, AUC AB = Pr[Y B >Y A ], instead of using the difference in location parameters for Y in the two treatment groups. An additional advantage of using the AUCs is the ability to include covariates, X, AUC AB (X) = Pr[Y B >Y A | X]. We were not able to demonstrate the use of covariates with the DME examples since we did not have access to the subject-wise data. It should be noted that modeling the AUC as a function of a continuous covariates X is somewhat problematic. A potentially better approach would be to model the ROC as a function of covariates X (Alonzo and Pepe (2002) ;Cai, (2004); Rodriguez-Alvarez et al. (2011) and Stanley and Tubbs, (2018). Stanley and Tubbs, (2018) utilized the relationship that the ROC AB|X (t) = Pr[PV AB ≤ t | X] where PV AB is the conditional placement value when an observation from B is placed into the survival curve for group A. One can perform the indirect comparison by testing the hypothesis that the CDFs defined by the two ROCs are equal using standard test of fit procedures. This work will appear in a separate manuscript.

Conclusion
The MFPT model provides a reasonable alternative for an indirect comparison within a network meta-analysis when the normality assumption for the measure of clinical effectiveness might not hold. The use of the ratio of AUCs has the added benefit of providing a stochastic interpretation, AUC AB = Pr[Y B >Y A ], instead of using the difference in location parameters for Y in the two treatment groups.