Joint Modelling of Longitudinal-Time-To-Event with Categorical Variable Indicators of Latent Classes: Application to Tuberculosis Data

Corresponsing author: Azeez Adeboye Department of Statistics, University of Fort Hare, South Africa Email: azizadeboye@gmail.com Abstract: In many clinical and reliability research reports, the outcomes of basic interest is the time to a particular event happens in order to indicate the person’s “true” state of health or survival status. Different models have been used to analyze such data separately, but may be unsuitable if the longitudinal and health status measures are correlated. In this study, mixed effect and Cox model of latent class are jointly modelled for the correlation between the covariates, observed and unobserved health status variable with binary latent class indicators. A Bayesian approach for Maximum likelihood estimates is implemented using Markov Chain Monte Carlo (MCMC) techniques. The repeated and survival measures are independently assumed to be a Gaussian process for latent bivariate. The joint model is applied to TB cohort study for the HIV comorbidity effect on event time for Tuberculosis patients. R package is used for curvilinear repeated measures of latent class model and joint latent class models for both repeated measures and survival time event.


Introduction
In various medical researches, the outcomes from the data depend on the time a particular event occurred, for instance time-to-event outcomes in Randomized controlled trials (Argyropoulos and Unruh, 2015). Data in such a way are referred to as time to event (survival) data of a particular interest (Austin et al., 2016). However, the event may not be necessarily death, but could be a remission time from a disease, symptoms relief or a disease recurrence. Many of such studies focus on the effect of patients' information on different survival predictions and jointly model the repeated longitudinal with time-to-event measures to develop a prediction models for dynamic event that reestablish over time the evidence of using joint modelling (Andrinopoulou et al., 2015).
Time-to-event outcomes can be used to censor a longitudinal data and modelling separately the repeated longitudinal and time-to-event measures, for instance using time-dependent random effect models (Barrett and Su, 2017), linear mixed effects models and Cox regression models (Hickey et al., 2016), may sometimes be incompetent to use, which could have a biased effect on the size of the estimates if the two models are correlated (Ibrahim et al., 2010). The Cox proportional hazards model is the most commonly utilized for survival model (Cox, 1992) and serves as part of the techniques used for this study. The Cox model has been included with various extensions such as inclusion of random effects (Vaupel et al., 1979), longitudinal modeling with outcome-dependent drop-out (Henderson et al., 2000), covariates measurement error (Wulfsohn and Tsiatis, 2010), covariates time-dependent (Sweeting and Thompson, 2011), multivariate survival times (Hougaard, 2000) and latent classes for longitudinal measures (Berlin et al., 2014).
However, in our study, the Cox model has been included with a classifying latent class indicators with explanatory covariates measured without error. The class membership information is only indirectly available for categorical variables that depends on the latent class regression model distribution, which can be used for multinomial binary indicators of latent classes. This model consists of a multinomial logit model for the covariates with latent class as well as the association of the observed and the latent classes' indicators. Meanwhile, conditionally independence is one of the assumptions for the indicators with given covariates and latent variables, which the probabilities will also to be independent of the covariates.
A joint model assessment for multiple categorical and survival data is established in this study using the latent class logit regression model as a distribution for categorical data and Cox Proportional hazards model as the distribution for survival times data on the conditional latent class. It is of interest to know the extent in which the disease is related to the survival times. One approach to explore this is to examine and establish the measurement model before including the survival time events with binary covariates using a Cox PH model. A semi-parametric PH model is presented to incorporate a variable with latent class as predictor of time-to-event. A binary value is used indirectly to measure the class association and likelihood estimation function is maximized to establish the joint modelling time event data and latent class variables using EM algorithm.

The Methods for the Joint Modelling
The joint modelling comprises of two sub-models: A generalized latent class logit model for binary variables and a Cox proportional hazards model for survival times.

The Generalized Latent Class Logit Model
Yi = (Yi1,….,Yij) t represents a multiple random variables (random vector) of J binary values for the ith individual in random variables with N individual population, where Yij is the response of case i on item j of J items and t of a particular class. Let Zi = (zi1,….,ziP) denotes (1 X P) row vector of zth covariates for ith individual. Assume that the population comprises of k subpopulations and with latent class variable Xi of the ith individual unobserved. Therefore, the conditional latent class probability distribution for the jth individual is expressed as: where, x = (x1,…., xJ) t and x = (1,…..,K) are parameters for class-specific probabilities of each of the individuals with value 1. Also, the probability of latent class assumed J individuals are mutually independent event, we have: However, the generalized logistic model is applied for modelling the association between the covariates (Zi) and the latent class (Xi) is given as: where, k is a (P1) scalar vector consisting the parameters for the kth group. Therefore, the distribution of (Yi,Xi) is expressed as:

The Cox Proportional Hazards Model
Let h(t|wi1,…,wik) signifies the hazard function of th i individual at time t and Wi is the covariates vector comprising of categorical and continuous variables and 0(t) is signified as baseline hazard function, which corresponds to the intercept term in the regression model. In proportional hazard model, 0(t) is not specified but positive and assumed a completely nonparametric shape for, which can be expressed as: where, Wi = (wi1,….,wiQ) stands for (1Q) row vector with a corresponding  is the (Q1) parameter vector. The (K1) is the parameter vector of  that contain the effect of the latent class variable Xi on the hazard (1 = 0). As a result of (4), the probability density distribution of the time to events is expressed as: is the baseline hazard when integrated. Meanwhile, in a case when the event time is not right-censored but has a non-informative censoring, the probability density distribution (Ui,i) becomes: However, this study assumes to have non-informative censoring.

The Joint Modelling of Cox PH with Latent Class Mixed Model Indicator
Using a latent class mixed model for regressing (Yi,Xi) on Zi and (i,i) on (zi,xi) for Cox Proportional Hazards model, therefore, the joint distribution model of (i,i) on (zi,xi) is expressed as: This study uses Expectation-Maximization (EM) algorithm, which applied to find the maximum likelihood estimates of observed parameters (i,i,Yi). This is achieved through iterative method of E-step and M-step. The E-step is used to estimate the expected log-likelihood of the complete data through a conditional estimation on the observed data while Mstep is applied in new parameters estimation to maximize the expected log-likelihood.
Suppose  = (,,0(t), ,v) hence, the log-likelihood for the complete data is expressed as: From the E-step algorithm, Qi(;  () ) = E(Lcmpt(;U,,Y,X)|U,,Y; () ) is estimation of the expected log-likelihood of the complete data with respect to the conditional distribution of (X|U, ,Y) to be calculated, where  () is the parameter estimate of  from the rth step.
In the M-step iteration, we maximize the E-step with respect to the parameters obtained in a new setting of the s i.e., Q is maximize as a function of . We expressed the posterior distribution of Xi with given parameter data for  =  () to be   i    = Pr(Xi = x|U, ,Y;  () ), c = 1,…,p.
Hence, Q and π are maximized as: For Q and κ is maximized as: in the M-step, in which the update of 0 depends on (,v). For the update, the following steps are considered: iii. Iteration scheme is carried between (i) and (ii) until convergence

Estimation and Results
Simulation for the estimation of the model was performed to illustrate the method and examine the practicality properties of the proposed joint latent class model and its parameters performances. We generated for n subjects for longitudinal data from a conditional distribution that is Bernoulli distribution as given by (1) and Weibull distribution for the baseline hazard proportional model to simplify the computation. We described the change in the longitudinal data subject over time using a mixed effects model as a latent class subjectspecific variables. We used binary covariates, time continuous covariate and also consider the interaction between them along the intercept term. The covariate zi represents the ith longitudinal subjects' characteristics.
We denoted the scale and shape parameters of Weibull distribution as . The censoring procedure used is assumed to be uniformly distributed and the given age is transformed linearly to create an effective age distribution function. In each subject, we generated a simulated repeated measurements of n = 200 replicates and k = 2 number of latent class-specific. In each of Monte Carlo simulated data sets, a sample of n = 200 and 500 were generated for a joint modelling defined by  = 0 with  = 1.0,  = 0.80 and 0() = 1. The number of repeated nominal time schedule measurements were taken for tij = 0, 2,4,8,16,24,32,40,48,56,64,72,80 and latent class were selected to cater for sufficient and adequate observations in each of the class-specific for easier posterior classification and to show the accuracy of the parameter estimation.
The latent class assumed to be represented as: Logit (i) = xi-0.5. The slope and the intercept were assumed to be distributed uniformly with independent measurement of error (ijN(0,0.1)). The estimates from both longitudinal and survival part were done for the two classes with shape of 1.5 and scale of 20 for class1 and shape 0.5 and scale 10 for class2 respectively. The simulation was performed using R language package.

The Simulation Results
The estimated parameter bias and standard error of the posterior means are reported in Table 1. There was overestimation in the longitudinal measurements of both the intercept and time (tij) when the sample size is small (n = 200) and the biases estimation for these parameters were more in latent class1 joint model compared to class2. BIC values are used to select the size of the latent classes and used it for the proposed class size K, then compare with the model with smallest BIC. To avoid overestimation and underestimation, we increased the sample size to 500 and found that the two latent class model BIC values are smaller compared to one latent class model and three latent class model respectively. However, this is correctly used to select the number of latent classes with the smallest BIC criterion for simulated datasets. The simulated data of n = 500, is used to evaluate the true value difference in the latent classes joint model. The true value slopes decreases in the same direction for the two latent classes (0.11 against 0.10). Therefore, the important of joint latent class model to identify various sub-groups increases the heterogeneity of the model across the latent class and the estimated standard errors tend to be slightly larger than the true ones, which makes the joint model to be conservative.

Estimation Results of Real Data
The joint model is applied to the TB longitudinal retrospective cohort data of all confirmed TB diagnosed in Eastern Cape Province from 2010 to 2015 recorded on monthly and yearly to study the association effect regarding the severe TB prognosis of time event data for elderly patients with those with HIV infection, dialysis, state of immunosuppression and cases of multi-drug resistant TB death risk. A total of 449 patients were included in the study. Due to too small variances of random effects, the age is subtracted from 25 and divided by 30 to decrease the numerical constraints of too large ages in quadratic mixed models. We also considered some covariates such as gender, age, smoking, alcohol use, body weight, smear status, type of TB and diabetes. The time-to-event was measured in days to accommodate the effect of TB prognosis on survival time for TB patients with severity and without.

Model Fit The Latent Class Mixed Models (LCMM)
The LCMM was implemented to illustrate the quadratic trajectories of TB prognosis factors by assuming that there is correlated random effects for TB patients age-factors functions. The model summary indicates the dataset information, quantity of subjects, number of missing observation deleted, latent classes and parameters. It shows also the number of iterations during the convergence process and criteria to show if the model converged correctly or satisfied. It gives the information about model goodness of fit estimates, which include the maximum log-likelihood value, Akaike Criterion (AIC) and Bayesian Information Criterion (BIC) values. The model also shows the parameter estimated, standard error of estimates, the approximately normal Wald Test statistics and the p-values.
The process of estimating the models with varying latent classes gives the values of log-likelihood estimates, parameters estimates, BIC and posterior proportion of each class. The optimal number of latent classes chosen is two according to the BIC in Table 2. The Bayesian Information criterion for the one (190.18851), two (187.82920, 185.54193, 184.40445) and three-class models (186.89412, 187.32477) respectively, indicating that the two latent class model is better. The proportion of replications from 2c BIC correctly identified the class model 92.1% of the time on the average and that the proportion was close to the probability of choosing the 2 class model randomly out of 1 class and 3 class models. The two-class model corresponds to the most risk TB prognosis factors response patterns, which are diabetes as an indicator and smoking as an indicator. The two groups of the two latent classes as a representative parts for each of the latent classes in terms of TB indicators of Age and gender of the patients (Table 3). The two latent classes may be explained as: (a) a diabetic patients have high prognosis risk factors of TB (b) smoking status of patients is a great risk factors of TB.
The age and gender of patient with TB are treated as predictors of latent class, which are significantly associated with latent class membership with an interaction between the two indicators ( Table 3). Interaction of differential item functioning (DIF) is examined in model 2c. The TB factor predictors (age and gender variables) are added separately for each of the five binary factor variables. The differential item functioning estimates and standard errors are summarized in Table 3. The results of the analysis indicate that no significant differential item functioning because the variables from both groups have a different probability results given in each of the responses (the values of z-scores in Table 3 are approximately normally distributed). DIF shows that the items are measuring different abilities for TB indicators subgroups to see if the items are measuring the same way for all TB prognostic factors subgroups. There is interaction of DIF of TB prognostic factors among TB patients from different groups with the same underlying true ability have a different estimation of giving the same results. It showed that Diabetes has highest DIF among the TB prognostic factors for both age and gender (11.33 and 2.18) follow by the alcohol use by the TB patients (0.57 and 1.00). The least of estimation in DIF was found in smoking habit of the TB patients.
The mean subject-specific predictions of the model is presented in Fig. 1a-1d and the goodness of fit statistic of the model support the model using the subject-specific and marginal residual plots in the two-latent class mixed model presented in Figure 4 (Appendix file).

Joint Modelling of Cox PH and Latent Mixed Models
The analysis of joint model is to fit a latent class mixed effects model and survival model using Cox proportional hazard model with baseline hazard functions estimated by Weibull distribution. The models considered having TB prognosis variables (age, gender, diabetes and smoking) as covariates for the latent class mixed model. In the analysis, class-specific quadratic trajectories of TB prognosis risk variables and adjusted for variable TB indicators. We jointly modelled the risk of severe TB prognosis according to diabetes and smoking habit assuming class-specific Weibull baseline risk functions with age and gender of TB patients.
Furthermore, we performed Likelihood Ratio (LR) test in comparing the joint model to a longitudinal model with the same covariate effect across latent classes. The analysis of the LR test is to show the time effect in longitudinal trajectories, with p-value < 0.01, which indicates over the time that the risk of severe TB prognosis has the same pattern validating the use of class specific time effects in the model. We additionally check all covariates (diabetes, smoking status and gender) and found them all to be less than 0.05. In this way, we keep all the covariates that are significant as class-specific in the model and use them to describe the effect of covariates on longitudinal and survival results in classspecific time analysis.   Est    Table 4. The estimation results of logistic analysis model indicates that age, gender, diabetes and smoking status are all significant covariates in subjects' classification (P-value <0.05). Male TB patients, who are older in age, diabetic and smoking were less likely to be classified into the class2, signifying that the class2 may comprise of the patients with better health status i.e., may not have the likely course or experience a TB disease. Comparing the estimation in the class2 (0.096) and class1 (0.761) with respect to age (age25) was observed that class1is higher and significantly associated with TB prognosis as the patients getting older but not significant in class2. The value associated with gender among TB patients in class1 is significantly likely to increase the severity of TB disease (0.911) compare to patients in class2. The value of interaction of age and gender of TB patients in class1 was higher compare to class2 and they have a significant effect on event of TB disease prognostic. Two class model possess small biases in parameter estimates and standard error estimates are close to their empirical values.
As for the survival analysis part, diabetes and smoking status are significant to the time-to-event for adults with TB prognosis. This shows that both diabetes and smoking status are significant factors on time-toevent of TB diseases. Patients in the class2 had a better survival rate (HR = 1.128) compare to class1 (HR = 1.284) concerning the effect of severe TB prognosis on the time-to-event for adults with TB disease.   The mean of posterior probabilities from the posterior classification (Table 5)  variance of random effects results show that the heterogeneity estimation in one class joint model is larger (1.056) than variance from latent class joint model, suggesting that classification of latent classes reduces the heterogeneity within each class (Table 5).
In summary, part of the objective in this joint modelling is aspect of dynamic prediction of the event for TB patients. We use EPOCE function to compute the predictive ability of our models using the Expected Prognostic Observed Cross Entropy (EPOCE) at different landmark times. The EPOCE is plotted to compare different models to envision the model predictive power at different landmark times. The predictive power difference between the two models is computed using Diffepoce function in displaying the associated plot function for both models.
Joint models for TB prognostic factors for two latent classes display a better predictive power showing a lesser EPOCE than a simple one class survival model.
Although the two latent class model gives a better goodness-of-fit in terms of BIC with good predictive accuracy, particularly after 80 years of age (Fig. 2).

Discussion
In this study, our model encompasses latent class indicator to jointly model the longitudinal and survival outcomes concurrently. The Cox proportional hazard model has been expanded to encompass variables with latent class measured by binary indicator as factor of time-to-event data. The latent classes stand for different categories of trajectories which were repeatedly measured and assumed to be a normal distribution with given latent class. The proposed model is used to evaluate the likelihood through the mixed effect model and survival processes to discover the underlying trends for the two processes efficiently. Hypertension free probability Diabetes Diabetes free probability The class-specific mean predicted trajectories in each latent class are shown in Fig. 3 according to two prognostic factors (smoking and diabetes). The plots depicts predicted trajectories for both fitted models at a different combination of covariates; male patients who are smoking and above the age of 65years for class1 (dotted lines) and class2 (solid lines). These patients are observed to be at the of severe TB prognostic risk of a prior smoking habit particularly for patients older than 25years. We discovered a relatively increased probability of TB prognosis over the time and the survival of patients having a drug resistant TB is lower in class2 with regard to the patients' age compared to class1.
In finding the estimators for global maximum likelihood, the inference procedures is set up sequentially for initial values in the estimation of the data. In each plots in Fig. 1a and 1b, the dotted lines represent true trajectories and solid lines give the average predicted model trajectories. The below set of dotted and solid lines predicted trajectories are for risk = 0 and above dotted and solid predicted trajectories lines are for risk = 1. The difference between the solid and dashed lines in each pair indicates the model bias. In overall, for one predicted joint model, predicted trajectories are observed with small bias regardless of fitted model (Fig. 1a). For the two interacting predicted joint model, the plots correspond to where the predicted trajectories are provided not only by number of risk, but also by survival status. The plots are largely in bias similar to mixed model plots but smaller in bias regardless of fitted model for the two class joint model. In contrary to the one-predicted joint model case, few bias is also observed for values at the extreme ends when twointeracting predictor joint model is fitted. Summarily, for one or two interacting predictor joint models to be considered, we computed the trajectories prediction from a fitted weighted subject-specific predictions and/or cross-validated EPOCE predictions, regardless of the individual true distribution of estimates differences and little bias expectation. When the sample size is large, trajectories prediction tends to be slightly more efficient when fitting the model.
The model goodness of fit is assessed through the comparison of predicted values against the observed values of the repeated measurement outcomes and plot the martingale residuals of the time-to-event outcomes (Appendix A- Fig. 4). More information about the model fit and subject heterogeneity can be provided through the plots by each classes. However, other forms of estimating baseline hazard functions can be explored such as generalized Gamma distribution and piecewise constant baseline hazard (Liu, 2009;Cox et al., 2007).

Conclusion and Recommendation
The estimation of parameters is jointly analyzed for the latent class mixed effect model with binary indicator and the Cox PH model using EM algorithm. The algorithm is constructed with nonparametric maximum likelihood estimation for the baseline hazard with Weibull distribution for the Cox model. Through simulation, we examined the performance of the joint model with two binary latent class indicators in calculating the joint likelihood and compare it with one class joint model using the likelihood ratio test, variance of the random effects and plotted EPOCE plot to show the comparison of different models to show the predictive power at different landmark times. Additionally, the latent class joint model in our study determined TB prognostic factors, in which the effects can be reduced largely by changing the lifestyle and TB treatment indictors, in which the effects of primary outcomes for TB disease can hardly be improved unless the other risk factors are controlled. The posterior classification from the results illustrated the different trends in the latent classes to show prognostic mean probabilities in each class threshold. The performance of the model from the simulation suggest that the joint estimation with latent class indicators performs better in finite samples. The model is applied on the real dataset to show the advantages of latent class inclusion in the model.
We use EM algorithm for maximum likelihood approach but a Bayesian approach may be of use in the same context. A longitudinal mixed effect data that would allow Poisson indicators can also be looked into as part of the joint model extension. Also, a good idea missing data approach to incorporate a multiple imputation would be suitable for use in a joint modelling approach and useful for future research. A shared random effect to join the longitudinal and survival processes is also a good idea (Liu et al., 2015). It should be noted that sensitivity analysis should also be conducted to estimate the impact of the number of degrees of freedom used for the survival and longitudinal trajectories indicator.