Application of a Beta Regression Model for Covariate Adjusted ROC

Corresponding Author: J.D. Tubbs Department of Statistical Science, Baylor University, Waco, USA Email: jack_tubbs@baylor.edu Abstract: The Receiver Operating Characteristic (ROC) curve and the area under the ROC (AUC) are widely used in determining the diagnostic capability of a binary classification procedure. Since the test performance is affected by covariates, the ROC and AUC have been utilized in a Generalized Linear Regression (GLM) setting. In this study, we revisit a problem where the AUC regression model was used in a clinical study with discrete covariates by considering ROC regression models with both discrete and continuous covariates. The two ROC regression models are based upon a widely used parametric model and a recently published model based upon fitting the placement values with the beta distribution. The two methods are illustrated using data from a clinic study.


Introduction
The Receiver Operating Characteristic (ROC) curve and the area under the ROC (AUC) are widely used measure of accuracy for diagnostic test to distinguish between two populations. An important application of ROC curve is to determine how a test's performance is affected by covariates. One approach is to model the AUC of the ROC curve by modifying the Mann-Whitney statistics (MW) as a GLM (Pepe, 2003). Another approach is to model the ROC directly. Dodd and Pepe (2003) proposed a Generalized Linear Model (GLM) framework to directly model the ROC with covariates as follows: for t∈ (0,1) where g -1 is a monotone link function, X is a vector of covariates, h 0 (⋅) is an unknown monotonic increasing function and b is a vector of the model parameters. Assumptions concerning h 0 (⋅) define whether (1) is a parametric (Alonzo and Pepe, 2002) or semi-parametric (Cai, 2004). Although the two models differ, they are both based upon the conditional expectation of Mann-Whitney U-statistic. Stanley and Tubbs (2018) presented an alternative GLM model for the ROC as a function of the covariateadjusted placement values. They compared their model with the parametric and semi-parametric model using simulated normal and extreme value data.
The objective of this paper is to investigate the parametric and beta ROC regression models when compared with the AUC regression model presented by (Zhang et al., 2011) using data from a clinical trial concerning the efficacy of an active drug to treat stress urinary incontinence in North American women.
The outline for this paper is as follows. Section 2 presents a brief overview of the two ROC regression methods. The results for the two methods using the incontinence trial data are reported in section 3. The paper concludes with a discussion in section 4.

Methods
Let Y be a continuous random variable used to distinguish between the two populations. Assume that the non-diseased or control population is indicted by D = 0. Let D = 1 denote the diseased or cases population of interest and assume that large values of Y are more likely to be associated with the disease indicator. The classifier assigns a subject to the diseased group if Y ≥ c. In which case, the true positive rate of the test is TPR(c) = Pr[Y ≥ c|D = 1] and the false positive rate of the test is FPR(c) = Pr[Y ≥ c|D = 0]. The ROC curve, is defined as a collection of all TPR-FPR pairings.
The placement value of Y, denoted as (PV D = 0 ), is the proportion of the reference or control population with observations greater than Y (the survival value for Y in the reference or control population). This is just a transformation of Y given by PV D = 0 = S 0 (Y). It has been shown that the CDF for the placement is the ROC. That is: where, t ∈ (0,1) and S 1 , S 0 are the survival function for the diseased and non-diseased populations.
Considering the covariates, denoted by X, the covariate-adjusted ROC can be written as: t is the probability that the test result Y of the diseased subject is greater than or equal to the tth quantile of the test result adjusted by the covariates of the unaffected subject.
The ROC is the CDF of the placement values PV D (Pepe and Cai, 2004). The covariate-adjusted notation is given by: Stanley and Tubbs (2018) provide a description of the algorithms used to model the ROC with the parametric presented by (Alonzo and Pepe, 2002) and the beta placement value model. A brief description of both methods are included for completeness. Alonzo and Pepe (2002) extended the use of ROC-GLM by considering the ROC curve as a parametric function of covariates and using the binary indicator as the dependent variable. The parametric function of covariates is reflected in parametric form of h 0 (⋅). The binary indicators compare the test result for a diseased subject to a specified set of covariate-adjusted quantiles of the distribution of test results from nondiseased subjects. Then the binary values can be modeled using logistic regression methods. Their parametric form for h 0 (⋅) is:

Beta Regression Method
Stanley and Tubbs (2018) proposed a method that models the placement values using beta regression. This method is easy to implement and it eliminates the dependency in models that use binary variables when using the logit or probit models. The Beta regression model can be written as a GLM (Ferrari and Cribari-Neto, 2004) in terms of its mean µ = E(Y) and precision parameter φ = a + b where the mean and variance for Y ∼ Beta(a, b): can be written as: ( ) E Y µ = and: The beta regression model can be written as: The algorithm for beta regression method can be written as: 1. Specify a set of FPRs: T = {t l : l = 1, ... ,n T } ∈ (0,1) 2. Estimate the covariate specific survival function 0, j X S for the reference population at each t ∈ T using quantile regression 3. For each diseased observation . Perform a beta regression on the PVs to obtain estimates β and φ 5. Transform to obtain â = μ φ and b = (1-μ )φ 6. Calculate the CDF of the placement values using the Beta(â,b ) distribution to obtain ROC and the AUC Application Zhang et al. (2011) presented results for AUC regression using data from a placebo-controlled study to determine the efficacy of an active drug to treat stress urinary incontinence in menopausal women. Their primary endpoint was the relative Percent reduction in Incontinence Episode Frequency (PIEF) from baseline to the final visit (12 weeks), where larger PIEF reduction indicates the desired treatment effect. They considered two discrete covariates; strata and horm50. The covariate strata indicate the severity of disease at baseline where 1 indicates the lowest level and 4 represents the highest number of episodes. The second covariate, horm50, is binary where 1 indicates that the subject had hormone replacement therapy prior to the start of the study. Zhang et al. (2011) indicated that they elected to reduce the computational complexity of their example by selecting a 10% random sample (n = 407) from the total available subjects.
Since we do not have access to the same sample used by (Zhang et al., 2011), we present the results for the data set that we have access (n = 2200) and for four (4) random subsets of size 420 with a 1:1 split for the treatment and control, in hopes of understanding data variability and dependency on the performance of the ROC regression methods. In addition, we will consider two additional covariates, a binary indicator of high level of BMI (BMI > 30) and a continuous covariate, BMI. Table 1 reproduces the results of a table given in (Zhang et al., 2011) where we have highlighted the potential significant terms in red. Tables 2 and 3 contain the results obtained when using the ROC methods with the strata and horm50 as covariates. The interaction terms were included. Our objective in this study was to use both ROC regression models to obtain estimates for the regression coefficients without being overly concerned about the significance of the terms as was done in (Zhang et al., 2011). When comparing our results with those given in Table 1 it is doubtful that any terms are significant when using the results of the beta model whereas the parametric method may have found some significant AUCs. It appears that the parametric ROC model produces estimates that are closer to those given in Table 1 than the beta method. This shouldn't be that surprising since the parametric ROC model modifies the use of the Mann-Whitney statistic used in (Zhang et al., 2011). Although we have elected not to be overly concerned about the standard errors of our estimates in this study, it should be mentioned that the standard errors for the beta method are obtained directly from the beta regression model whereas the parametric method makes use of bootstrapped estimates.          Table 4-7 summarize the results when using BMI as a covariate with the entire data set and 4 subset data sets. Table 4 and 5 summarize the results when using the discrete BMI covariate. There is a lot going on in these tables, but both methods indicate that the separation between the treatment and the control groups decreases as the BMI increases. We see similar results when using BMI as a continuous covariate with both ROC methods (Tables 6 and 7).

Conclusion
Our objective was to demonstrate how two ROC regression methods could be used instead of the more commonly used AUC regression models based upon the Mann-Whitney statistic when one has both discrete and continuous covariates. The ROC model provided believable results when used with data from a clinical study where the results from the AUC model were published. The parametric ROC model given by Alonzo and Pepe (2002) is widely used and is commercially available for use in R packages and Stata. The beta model based upon modeling the placement scores given by Stanley and Tubbs (2018) is not as widely used. Yet, it performed as well as the parametric model.

Funding Information
No external funding was used for this research and manuscript.

Xing
Meng: Computation and computer programming support. Created tables and figures.
J.D. Tubbs: Problem formulation, writing and editing the manuscript.

Ethics
No ethical issues were encountered in connection with this manuscript.