Functional Non-Inferiority Hypothesis Testing for Longitudinal Data

Department of Mathematics-Statistics, Pan African University Institute of basic Science, Technology and Innovation (PAUSTI)/Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya Service d’épidémiologie et de Santé Publique du Centre Pasteur du Cameroun, Membre du Réseau International des Instituts Pasteur, Yaoundé, Cameroun Université de Yaoundé I CETIC, UPMC Université Paris 06, IRD, Unité de Modélisation Mathématique et Informatique des Systèmes Complexes (UMMISCO), F-93143, Bondy, France


Introduction
The study design of non-inferiority randomized cohort trials is increasingly applied to show the noninferiority of new health interventions (Ng, 2015). One of the advantages of such a study scheme is the longitudinal aspect of the collected data. Indeed, this study scheme makes it possible to collect repeated measurements on the people included in the study during the follow-up period. Depending on the follow-up duration and the delay between measurements, the number of repeated measurements per person could be important. Thus, longitudinal trials allow to get an array of data on the variation of the main endpoint on a predefined time grid. The general goal of the noninferiority trials is to show the non-inferiority of a new health intervention compared to a reference intervention. The evaluation may be related to the variation of the main endpoint in the whole study period. We situate our work, in this framework, motivated by the non-inferiority randomized trial conducted by Laurent et al. (2011). However, although the interest is on the variation of the endpoint on the follow-up period, the assessment of non-inferiority is carried out at a precise moment during the study (generally at the end of the study), thus reducing the problem of the non-inferiority hypothesis testing in the finite dimension. Indeed, in the finite dimension, the 209 non-inferiority test is a well-studied problem. However, the infinite dimensional functional case processes have additional difficulties. In this work, we will adopt functional data analysis approach to perform the noninferiority hypothesis testing.
Although in practice the data are recorded discretely, with large measurement errors, in functional data analysis the data are treated as though they are in the form of a curve. Observed data are subject to a pre-processing steps, usually based on local polynomial or spline methods (Eubank, 1999;Wand and Jones, 1995;Green and Silverman, 1994;Ruppert et al., 2003), to transform them to the smooth curve from which the methods of functional data analysis are applied. In many instances, the preprocessing step is not of great importance. However, some studies have shown that it has the potential to significantly reduce power Hall and Keilegom (2007).
There has been a prosperous period in the development of functional hypothesis testing procedures to deal with simple hypothesis (Darlin, 1957;Johnson and Kotz, 1990). However, the more common situation involving composite hypothesis is more challenging. Little development includes functional composite hypothesis testing and still less for functional non-inferiority hypothesis test. The point-wise test, L2-norm-based tests (Faraway, 1997;Zhang et al., 2010a;Zhang and Chen, 2007), F-type tests (Shen and Faraway, 2004;Zhang and Liang, 2013), bootstrap tests (Faraway, 1997;Cuevas et al., 2004;Zhang et al., 2010a;Zhang and Sun, 2010) are well described for two-sided composite hypothesis test. However, for the one-sided composite hypothesis test, all those tests could not be applied.
In this work, we used a Bayesian approach to construct an optimal point-wise test and simultaneous confidence bands to construct a global test procedure to perform the non-inferiority hypothesis testing on the whole period of the follow-up.
The section 2 present the formulation of the problem and introduce the notion of functional non-inferiority hypothesis test. The section 3 present the optimal test for functional data which is adopted for functional noninferiority test on a continuum domain. In this section, it is also presented a global test procedure based on simultaneous confidence bands. The assessment of the adopted methods is done in the section 4 through a simulation example.

Formulation of Functional Non-Inferiority Hypothesis Test
Let's consider an active controlled non-inferiority trial with repeated measurements over time, where the goal is non-inferiority testing of a new treatment or health intervention(N) versus a reference health intervention (R). We assume that the endpoint is a continuous variable X, XN and XR for group N and R respectively that should be observed on finite grid point {t1,, tm}. Generally, the non-inferiority hypothesis testing is performed at the end of the follow-up period tm. This makes the data collected before the end of followup period useless for non-inferiority test. However, the repeated outcomes can be viewed as variable of dimension m, (XN(t1), ,XN(tm)) and (XR(t1), ,XR(tm)) for the groups N and R respectively. The idea is to get a decision on non-inferiority testing on the whole continuum domain  = [t1, tm] not only at the end of follow-up period tm. To overcome that, some tools such as longitudinal and functional data analysis can be used. Longitudinal data analysis has been much used in the context of cohort studies. However, the functional data analysis is more flexible and consists of modelling or converting the initial data set on the discrete grid {t1, , tm} into curves or functional dataset on the continuum interval  = [t1, tm], Ramsay and Silverman (2005) have provided a broad overview. Basically, it's consist of assuming there exist functions fi on t which modelled individual trajectory (Xi,c(t1) , , Xi,c(tm)), i = 1…n, c{R,N}. That is by: ic j  is the error measurement process for ith individual in group c{R,N} at tj. In the practise, the true expression of individual function fi,c is unknown, but can be approximated by using observed data and approximations techniques such as local polynomial kernel smoothing, P-spline, regression and smoothing splines. The general principle is to get the approximation of functions fi,c such that the error measurements are minimised. It follow that the underline individual functions: where, L(.) is the predefined non-inferiority margin function, such that L(t) > 0, for all t. The hypothesis test in Equation 1 will refer to functional noninferiority hypothesis test on the continuum domain .
Most of the statistical tests for overall testing addressed are for two-sample problem for functional mean differences. Zhang et al. (2010b;Zhang, 2014) is proposed an overall test for mean difference based on L 2 -Norm, Staicu et al. (2014) and Shen and Faraway (2004) have proposed pseudo likelihood ratio test. All such methods could not be applied for the one-sided hypothesis test problem for the functional means difference in Equation 1. In fact, the construction of the test statistic uses the L 2 Norm or other global test statistic based on Sup norm (Taylor et al., 2007), which cannot allow knowing to the direction of the inequality when the null hypothesis is rejected.
One could then alternatively use the pointwise approach which had been used in Forgaty and Small (2014) for equivalence testing for functional data. In that case, it is required to control the compound error for the test on the whole continuum domain.

Methodology
Notations and Assumptions 1. We assume that the functional random variables Xi,c, c{N,R} are i.i.d. That means, for all t in , the real individual trajectories Xi,c are independent and identically distributed 2. Gaussian Process(GP) with mean function m and covariance matrix g will be denoted GP(, ) and fc(t)  GP(c(t), c(t)), t, c{R,N} 3. Lets by ˆc  , c{N, R} the estimator of the mean function c:

Pointwise Approach based Test
To perform the functional non-inferiority hypothesis testing in Equation 1, one can perform the pointwise non-inferiority hypothesis testing using scalar case at each point. Then, for a given t, let define by r(t) = 0 and r(t) = 1 when the null and alternative hypothesis are respectively true, w(t) = 0 and w(t) = 1 when the null and alternative hypothesis are declared respectively true from the observed data. The total of set of true and false null hypothesis is defined by: Tk = {t, r(t) = k}, k = 0,1, the total of set of declaration and non-declaration of null hypothesis from data is defined by: Dk = {t, w(t) = k}, k = 0,1 respectively. The Table 1 summarizes the outcomes of pointwise functional non-inferiority hypothesis testing : Let assume that at every point t there is type I error t. For the decision about the noninferiority test on the whole domain , it is required a compound error measure like in multiple hypothesis testing. The False Discovery Rate (FDR) formally introduced in Benjamini and Hochberg (1995) and Family Wise Error Rate (FWER) introduced in Hochberg and Tamhane (1987) are respectively the main indicators used for evaluating the compound type error in the setting of multiple hypothesis testing. The FWER for functional data by Cox and Lee (2007) is well appropriated on the condition of permutation pivotality which may not be held in the functional non-inferiority hypothesis testing. In fact, for functional non-inferiority hypothesis testing, the inequality in the null hypothesis tests, prevent this condition to be satisfied. In the setting of this work, the false discovery rate is used as the compound type I error measure. Then, there are some methods devoted to the control of the false discovery rate in the context of mean differences, such as Bejamin-Hochberg method and Benjamin-Yekutieli methods in Benjamini and Hochberg (1995). Xu et al. (2018) proposed another point-wise method controlling the false discovery, which had been found better than Bejamin-Hochberg method and Benjamin-Yekutieli methods. Therefore, in this work, the optimal pointwise test introduced in Xu et al. (2018) is adopted for the functional non-inferiority hypothesis testing.
This marginal false discovery rate it is defined by: where, L is the Lebesgue measure of interval subsets.
The overall power for the functional non-inferiority test is define in a similar way as in Leventhal and Huynh (1996) and Sun et al. (2015) for multiple two-sided testing by: For a given compound nominal type I error c, we want to determine the test w with a false discovery rate m frd such that ˆc mfrd   . As shown in Xu et al. (2018), the optimal test w should be searched among the test The false discovery rate for a test w being estimated by: where, t1,, tN are the center points of the sequence of The probability S0(tj) = Pr(r(tj) = 0) is the probability of true null hypothesis of no non-inferiority at the point tj, it is unknown and can be estimated only in a Bayesian setting. Therefore, one would assume a prior distribution Therefore, the test decision at any point t is given by: The optimal test w controlling the false discovery rate at a given nominal compound type I error rate ac is then ww

Algorithm for Determining the Optimal Test
The expression of m fdr is not explicit, therefore, the determination of  ★ requires numerical computation techniques. As shown in Sun et al. (2015), m fdr is monotone decreasing, then, the  which provides the maximum false discovery rate should be found in a positive neighbourhood of 0.
Also, when  tends to infinity, the function m fdr will tend to null. Therefore, the computational algorithm will depend on c (smaller or greater). For larger c (for example 10% or 5%), the forward algorithm presented in the Algorithm 1 can be suitable with an initial value 0 closer to 0. In the case of smaller c (for example 2.5% or 1%) the backward algorithm presented in the Algorithm 2 can be preferable with larger initial value 0. Whatever the case, as any computational problem with the initial input parameter, the results will depend on.

Confidence Bands based Test
Likewise in the non-inferiority hypothesis testing for scalar data Ng (2008); Elie et al. (2008); Food and Drug Administration (2016), one can adopt confidence bands for the formulated functional non-inferiority testing in Equation 1. The idea is to reject the null hypothesis when the margin function -L is under the lower confidence band of N-R on a subset  of . The Fig. 1 gives an illustration of functional non-inferiority testing based on confidence bands. Denoting by l and u the lower and upper confidence bands of N-R respectively,  the confidence bands level, following are the steps of the test procedure: 1. Construct a confidence bands [l, u] of level  of NR 2. Compare the two functions l and -L on  3. Reject the null hypothesis H0 of no non-inferiority on  if and only if there exist a subset  of  such that l is greater than -L on  The construction of confidence bands can be done using a pointwise approach in parametric or non-parametric settings. That will not be valid for overall or simultaneous inference, since the coverage level of confidence bands will be less than  Degras (2017). In this work, the simultaneous confidence bands (SCB) by Degras (2017) is adopted. The lower and upper confidence bands l and u for N-R are defined for all t respectively by:  (2017), provided a type I error which is asymptotically equal to 1- and the statistical power tending to 1. The level of the proposed test procedure for functional non-inferiority test and the statistical power will be evaluated through a simulation example by using the Monte-Carlo method.

Simulations Scenario and Settings
The purpose of the simulation example is the evaluation of the proposed functional non-inferiority hypothesis testing based on the optimal pointwise test and confidence bands based test. The nominal type I error rate is evaluated for the SCB based test, the mfrd is evaluated for the optimal pointwise test, the power is estimated for both proposed tests. Two scenario were considered: A scenario simulating functional data with the null hypothesis satisfies and a scenario with the alternative satisfies. Each scenario purposing respectively the evaluation of actual type I error rate and mfdr for the optimal pointwise test and SCB based test and statistical power for both tests.
The inputs parameters using for generating functional data sets are functional margin L, functional means N and R, co-variance matrix N and R. In all simulations, we consider the discrete grid point (0,6,12,24) and continuum domain  = [0,24]. In all simulations, it has been assumed the equality covariance matrix, (t, s) = N(t, s) = R(t, s). It was considered the case of correlated data, which is the most encountered in the practice. Then, it was assumed that the correlation depends on the distance between time points, the data at two closer points are assumed more correlated than the data at two distant points.
The equal sample sizes are considered (nR = nN = n), n = 30,100,1000. The Gaussian process was used to simulate process on the discrete grid point (0,6,12,24), then splines was used for smoothing on the continuum domain  = [0,24]. The nominal compound type I error has been set to c = 10%. In all simulations, the estimation of the mfrd was based on the forward algorithm 1, therefore, the initial parameter 0 was chosen smaller and in the closer neighborhood of 0.
The estimation of false discovery rate for the optimal pointwise test and the actual type I error rate for the SCB based methods, the data are generated on the null hypothesis, for example, that is when N(t) = R(t)-L(t) for all t. In that case we chose R(t) = 30t, L(t) = (35/3)t +50. The power for both methods is estimated by drawing the functional data on the alternative. Similarly to the case of scalar data (Zhang, 2006;Flight and Julious, 2016), it is considered the particular case when N = R on  = [0,24]. The actual type I error rate and statistical for the procedure based on SCB are also evaluated according to the level of SCB. Therefore, it is considered a confidence bands level  = 95%, 90% and 80%.
The R software programming language (R Core Team, 2016) has been used to conduct all the simulations and codes are accessible in a separate file. However, the packages FDA by Ramsay et al. (2018) and mvtnorm by Genz et al. (2018) have been specially useful for the simulations and the manipulations of functional process data. The estimation of the probability of getting the null and alternative in the optimal test require a prior distribution on the process N and R. Most time, one choose Gaussian processes with zero means and co-variances functions KN and KR (It is assumed equal co-variances functions: k = KN = KR) which belongs to a known family of functions. There are constant, polynomial, Matérn, rational quadratic, exponential and so on. In this work, we will use the exponential family defines by: where, w and  are called hyper-parameters, x and y are the points where the co-variance function is estimated. In this work,  will be chosen fixed and equal to 190 2 and w = 100.

Simulation Procedures
The estimation of mfdr and , are done by the following steps: 1. Simulate two couples of Gaussian process XN and XR on the discrete grid {0,6,12,18,24} with functional mean N and R as described above such that the null hypothesis is satisfied and with sample size n 2. Convert Gaussian process XN and XR into functional data on the continuum [0,24], then get two functional data set XN(t) and XR(t) 3. Compute the mean functions estimated The estimation of the power is done in a similar way, but at the step one, the data are generated on the alternative hypothesis. then, estimate the power by      . For the test based on confidence bands, it is considered a level of  = 95%, 90% and 80% for confidence bands and all the input parameters as in the optimal pointwise test are considered. The type I error and statistical power is estimated as follows: 1. Simulate two couples of Gaussian process XN and XR on the discrete grid {0, 6, 12, 18, 24} with functional mean N and R as described above such that the null hypothesis is satisfied and with sample size n 2. Convert Gaussian process XN and XR into functional data on the continuum [0,24], then get two functional data set XN(t) and XR(t)

Stability Analysis and Simulation Results for Optimal Pointwise Test
The optimal pointwise test is based on the numerical algorithm with the input parameters. The optimal pointwise test is studied according to the initial parameter 0 by fixing the tolerance tol and the maximal number of iteration Maxiter. A smaller value in a closer neighborhood of 0 has been chose by 0 = 1e-10, then larger values of 0 = 1 and 0 = 100. The idea is to provide a better guess of the initial parameter for more accurate results of the pointwise test. The results are presented in the Fig. 2. With the same value of 0, the test with large sample sizes gets closer to the nominal compound type I error rate. While for the smaller sample size the test would be more conservative. Whatever the sample sizes, the mfdr estimate gets smaller as the initial parameter 0 gets larger. The larger value of 0 would lead to a more conservative test. Therefore, the results suggest that the smaller value of the initial guess 0 at a closer neighborhood of 0 would be preferred. For more accuracy, the stability analysis is done to study the pollution of results for a random choice of 0 at a closer neighborhood of 0. Due to computation time, which is high, the stability analysis was limited to three cases of a random choice of Fixing the tolerance and the number of iterations, the optimal pointwise test would be stable for 0 guess at a closer neighborhood of 0 and for small sample size n = 30, the mfrd estimate would be around 5%, for the medium sample size n = 100, the mfrd estimate would be around 7% and for large sample size n = 1000, it would be around 9%. Concerning the statistical power, as shown in Table 2, the power tends to 1.

Results for the SCB Based Test
The results for SCB based test are summarized in Table 3. The type I error rate and power are estimated according to the sample sizes and confidence bands level. As the sample sizes increase, the simulated type I error rate decrease and seem converging to specifics values. The results can allow concluding that the method based on confidence bands with level of 95%, 90% and 80% lead approximately to a type I error rate of 2.5%, 5% and 10% for large sample size respectively. Therefore, for a given confidence bands level , the methods would produce a test with a type I error rate approximately to (1-)/2 asymptotically. Whatever the sample sizes and the confidence bands level and the statistical power estimation was equal to 1.

Discussion
This work has introduced functional noninferiority hypothesis testing for the continuous variable in longitudinal trial. After formulating the hypothesis test, the optimal pointwise test in Xu et al. (2018) and simultaneous confidence bands in Degras (2017) were adopted. The pointwise test has the advantage to show the significant area. It was not possible to adopt the classical global test for functional two-sample mean problem in Zhang et al. (2010b); Zhang (2014). Since they are based on a norm which could not allow evidence for the direction of the inequality when the null hypothesis is rejected. However, the proposed SCB based test procedure can be regarded as a global test. The both proposed test 216 procedure for functional non-inferiority testing got good performances for large sample sizes. The added value of this work is that the noninferiority hypothesis testing is determined on a whole continuum domain not only at the end of follow-up period. That can allow more flexibility in the interpretation of the results of the trial. Also, that could be relevant for the determination of non-inferiority delay which could be helpful to determine the follow-up duration of future trials with the similar treatment effect. The proposed methods based on the pointwise multiple testing procedure involved numerical techniques methods which approximations depend on parameters such as the initial value and tolerance from which the results would depend on. This study provided a stability analysis for a proper guess of the initial value, which had not been studied in Xu et al. (2018).
Like any scientific study, this study presents some limitations, for example, an improper guess of the initial parameter would lead to a too conservative test. Also, the usage of optimal test required non-linear recursive programming which is costly in terms of execution time. All these, as well as a global test for functional noninferiority based on a test statistic, could be another interesting future avenue of research. The study has introduced the non-inferiority test with functional endpoint which has not been previously studied in the literature. But, that involve many methodological aspects for non-inferiority trial such as assay sensitivity, constancy assumption and non-inferiority margin which should be studied for functional endpoint. This may constitute an interesting issue for future research work in the non-inferiority trials.

Conclusion
This article introduced the non-inferiority hypothesis testing with a functional endpoint. The pointwise based test and simultaneous confidence bands based test were proposed. Both proposed test procedures got good performances for large sample sizes. For small sample sizes, the pointwise based test would be too conservative while the simultaneous confidence bands based test would be a bit liberal. The functional endpoint is less used in clinical trial studies. We hope that this study could attract the attention of practitioners in the area of clinical trials of the relevance of functional endpoint in clinical trials in general and non-inferiority trials in particular.