Determination of Predictors Associated With HIV/AIDS Patients on ART Using Accelerated Failure Time Model for Interval Censored Survival Data

Corresponding Author: Prafulla Kumar Swain Department of Statistics, Faculty of Mathematical Sciences, University of Delhi, Delhi110007, India Email: prafulla86@gmail.com Abstract: The main objective of this paper is to identify the independent predictors affecting the survival of HIV/AIDS infected patients on Antiretroviral Therapy (ART), an interval censored event time outcome. A total of 2052 HIV/AIDS patients, who were on ART at Ram ManoharLohia Hospital, New Delhi, India, during the period of April 2004 to December 2010, were included for analysis. Accelerated Failure Time Models (AFTM) viz., exponential, Weibull, lognormal and loglogistic for interval censored survival data, have been used to determine the significant predictors for HIV/AIDS infected patients. The best model is selected on the basis of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values. Out of 2052 HIV/AIDS patients 65.4% were males and 34.6% were females. A majority 93.7% of patients had CD4 cell counts below 350 cells/mm 3 at the time of initiation of ART. The mean age of patients at diagnosis was 34.28±8.19 years. The prognostic factorsviz., age, sex, CD4 cell count, past smokers, baseline hemoglobin and baseline BMI are found to be statistically significant (p<0.000) for HIV/AIDS patients on ART. Hence, a special attention is needed for patients with low CD4 cell counts, low BMI and low hemoglobin. Lognormal AFT model is found to be the best model to identify the independent predictors for survival of HIV population.


Introduction
The main interest in survival analysis is to identify the relationship between the survival time and a set of covariates. However, in general the survival time or the time to a particular event of interest is observed exactly or right-censored. There are numerous techniques available for estimating the survival function and also for estimation of the effects of covariates for these cases. In certain situations, like HIV/AIDS studies, where time to event data are collected by assessing patients in periodic follow up visits, the event cannot be observed exactly like onset time of HIV infection, incubation period of AIDS after HIV infection, death time; however it is known to happen within some interval, thus such observed events are interval censored. In HIV dynamics every patient is supposed to visit Antiretroviral Therapy (ART) centre after four weeks, but actual visit time may vary from patient to patient and also time between visits may vary. The patients may visit ART centre at time that is convenient to them rather than scheduled time. Therefore the data pertaining to terminating event (i.e., death) is considered as interval censored survival data (rather than right censored). Suppose that the survival time T i (say) of a patient lies in the interval (L i , R i ), that is the last available visit date and end of the study.
The theory developed for right censored data cannot be applied directly to interval censored data, due to the complexity and special structure of interval censoring. Nevertheless, the right censored data can be considered as a special case of interval censored data (Singh and Totawattage, 2013). Several approaches have been proposed for the estimation of survival function for the interval censored data; Peto (1973) was the first to propose a non-parametric method for estimating the survival function based on interval-censored data. Turnbull (1974) proposed a non-parametric method of estimating survivorship function for doubly censored data. Finkelstein and Wolfe (1985) provided a method based on regression analysis to accommodate intervalcensored data.
The semi parametric Cox PH model is the most common approach in survival analysis and has been used to evaluate the covariate effects on hazard function of failure time data. Interestingly, the result from a PH model is difficult to interpret in terms of survivorship. However, parametric AFTM is being treated as best attractive alternative to Cox PH model. AFT model provides concise and more intuitively interpretable results of survival data (Wei, 1992). AFTM is a linear regression model in which the response variable is logarithm or a known monotone transformation of a failure time (Kalbfleish and Prentice, 1980). AFTM has been extensively studied by Buckely and James (1979;Koul et al., 1981;Robins and Tsiatis, 1992;Jin et al., 2003;Kwong and Hutton, 2003;Orbe et al., 2002). Kay and Kinnersley (2002), suggested that AFTM is an alternative choice, when the proportional hazard assumptions does not hold. Patel et al. (2006) suggested that AFT model should be considered as an alternative to the PH model in the analysis of time-toevent data in medical research. Huang and Xie (2007) proposed a least absolute deviation method for estimation in AFTM with right censored data. Hernán et al. (2005) developed a structural AFT model for estimating the effect of HAART on AIDS free survival in two prospective cohort studies of HIV infected individuals. Xue et al. (2006) discussed a semi parametric accelerated failure time regression model for interval censored HIV/AIDS data. Grover and Banerjee (2011) estimated survival of HIV-1 infected children for doubly and interval censored data.
The introduction of Antiretroviral Therapy (ART) has significantly reduced morbidity and mortality in HIVinfected patients in various developed and developing countries (Stringer et al., 2006;Severe et al., 2005;Sharma et al., 2010). There are several studies which describes the estimation of survival of HIV populations and underlying covariates effect on hazard of death among HIV patients on ART by using Cox proportional hazard model for right censored data in India and resource limited setting countries (Ghate et al., 2011;Rai et al., 2013;Kee et al., 2009;Jerene et al., 2006). However, no one has tried to identify the significant predictor associated with the survival of HIV/AIDS patients undergoing ART using parametric AFT Model under interval censored set up.
We have modeled HIV/AIDS population on ART by using AFTM and the efficiency of models has been compared by employing Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Cox-Snell residual plot has also been used to test overall model fit. The software packages survival and interval in R and STATA (version 11.1) have been used to perform the statistical analyses.
The remaining part of the paper has been organized into three sections. Section 2 deals with methods used, section 3 illustrates the results of HIV/AIDS data, section 4 gives the discussion and concluding remarks.

Accelerated Failure Time Model (AFTM)
Suppose that n number of HIV/AIDS patients are under ART. Let T i be the survival time (i.e., time to death after initiation of ART) of i th HIV/AIDS patient with survival function S(t). Let (L i , R i ) be the interval in which T i is being observed, such that L i <T i <R i . Also if the event does not occur till the end of study then the patient is said to be right censored, in this case we assume that T i can occur in the interval (L i , ∞), where L i is the time period from the beginning of the study until the last visit.
Let X = (X i1 , X i2 , …X ip )' be the values of p covariates for the i th patient. Then the log linear form of AFTM describes a linear relationship between logarithm of survival time and covariates, as given by (Kalbfleish and Prentice, 1980): where, ' is a vector of regression coefficients, and are intercept and scale parameters respectively and the error term i is assumed to have some distribution (i.e., extreme value, normal or logistic). This transformation leads to the Weibull, Lognormal or loglogistic AFT models for T i (Collett, 2003).

Maximum Likelihood Estimation
Suppose that Y i = log(T i ), then the density function of Here we assume that X i1 = 1 for all i. Then the likelihood function for the set of observed intervals {(L i , R i ], Xi, i = 1,2…n} can be written as: Where: And the δ i is defined as: The functions f(w) and S(w) denote the probability density and survival functions of the error variable w in model (1.2), respectively. Now the estimates can be obtained by maximizing the likelihood equation (1.3) with respect to β and σ. A complete description of the procedure is given in Klein and Moeschberger (1997;Sun, 2006).
Results obtained from AFT models can be summarized in the exponentiated form as time ratio (i.e., TR(= exp( )) unlike Cox model hazard ratio. Thus TR>1, indicates prolonged survival time and TR<1, is associated with a decrease in survival time.

Model Comparison
In order to select best fit model, we have used AIC and BIC. The AIC provides a practical and versatile way to identify a parsimonious model from a set of competing models, by adding a penalty term proportional to the number of parameters in the model. This penalty term guards against over fitting. The AIC is defined as: where, p is the number of covariates in the model, k = 1 for exponential and k = 2 for Weibull, loglogistic and lognormal models. The model with smaller AIC value is termed as the best model. The AIC penalizes the number of parameters less strongly than the BIC, (Schwarz, 1978). BIC is defined as: where, p represents the number of covariates in the model and n represents the number data points. The main advantage of the BIC approximation is that it includes the BIC penalty for the number of parameters being estimated. The model with smallest BIC values is chosen as the best model.

Data Sources
The data set consists of 2052 HIV/AIDS patients who were undergoing Antiretroviral Therapy in the ART centre of Dr. Ram ManoharLohia Hospital, New Delhi, India, during the period of April 2004 to December 2010. Information is collected retrospectively and subsequently followed up through their ART routine registers. Taking preliminary inclusion criteria as patients should be above 18 years of age and complete baseline information on CD4 cell count, date of visit, Mode of transmission, weight and hemoglobin etc, 2052 patients are found to be eligible for the analysis. At the end of the study period, 273 patients were dead and remaining 1779 patients were known to be alive, since the exact date of death is not known, the event death is known to lie in an interval that is the last available visit date and end of the study. Thus the observed event leads to interval censored survival data.

Results
The Table 1 contains the descriptive statistics of the study population. Out of 2052 HIV/AIDS patients 65.4% were males and 34.6% were females. A majority 93.7% of patients had CD4 cell counts below 350 cells/mm 3 at the time of enrolment. Mean age of patients at diagnosis was 34.28±8.19 years. During follow up, 13.3% patients died and remaining were alive at the end of study. The most common mode of HIV transmission was found to be sexual (hetro + homo) 74.3 and 8.2% transmission occurred through blood + IDU and remaining 17.5% patients mode of transmission was unknown. Most of the patients are from the urban area. 80.7% patients were married. The addictions to smoking, alcohol and drugs are recorded in 31.9, 36 and 1.5% of patients respectively. In terms of body mass index, majority of the patients were normal with a mean of 20.12±2.66. Also majority of the patients had normal hemoglobin level at the time of enrolment, with a mean of 11.01±1.78. The log rank test result showed for different categories of the predictors viz, sex, MOT, CD4, alcohol and hemoglobin are significant difference in their survival. Table 2, shows the coefficients and corresponding standard error and TR for different (exponential, weibull, lognormal and log logistic) AFT models for interval censored data. The covariates viz. age, sex, CD4, unknown mode of transmission, hemoglobin and BMI are found to be statistical significant factors for the survival of HIV/AIDS patients on ART. Location status (rural area) is found to be significant in the exponential and weibull AFT model whereas the same is not found to be significant in the lognormal and loglogistic model. Other factors like alcohol is found to be significant in the lognormal and loglogistic AFT model but not in the exponential and weibull model.    Also we can observe that the younger patients those who were <35 years of age had better survival (depicts TR= 1.017, TR = 1.014, TR = 1.017 and TR = 1.015 in exponential, weibull, lognormal, loglogistic respectively). Female patients had much better survival than their male counterpart. Those patients who had CD4 count below 350 cells/mm 3 at the time of diagnosis have a lesser chance of survival. For lognormal AFTM, AIC and BIC values found to be smallest, hence we suggests lognormal AFTM could be the best model for HIV/AIDS population, at the same time we cannot ignore loglogistic model, it could also be considered as an alternative. All the parametric AFT Models are fitting well is confirmed by the Cox-Snell residual plot (in Fig. 1).

Fig. 1. Cox Snell residual plot for different parametric AFT model
Since the plotted residuals versus the Nelson-Aalen estimator of their cumulative hazard function makes approximately a straight line with slope 1, therefore it justifies the model is adequate.

Discussion and Conclusion
In this study we have tried to determine the factors associated with HIV/AIDS patients who are undergoing ART in an ART centre Delhi, using parametric accelerated failure time models (viz. exponential, weibull, lognormal, loglogistic) for interval censored survival data. Interval censored data very often appears in HIV/AIDS studies, especially when the event of interest cannot be observed exactly. The analysis of interval censored data is growing importance and the methods for analysis of interval censored survival data have been developed over the last two decades, but they are more complicated and harder to implement than their right censored counterparts.
There are many situations where AFT Model provides better description of data than Cox PH model (Kay and Kinnersley, 2009;Nardi and Schemper, 2003). Orbe et al. (2002), suggested that the AFT models; lognormal and loglogistic should be treated as an alternative choice when proportional hazard does not hold. Moreover it is too difficult and complicated to obtain the estimates in Cox PH model for interval censored data. The result of our study has a strong inclination for the Lognormal AFT model as the most suitable one; better than other based on AIC and BIC values. In an earlier paper by Nakhaee and Law (2011) suggested that Weibull is the best fitted parametric model for predicting survival following a diagnosis of HIV infection without and with a diagnosis of AIDS and could be used for future projections of deaths from HIV/AIDS. Their analysis and inference was based on right censored data. But, here we recommend that lognormal AFT model should be employed to identify the independent predictors for HIV populations, in interval censored survival data. Further studies are needed to confirm the findings.
Our findings suggest that the prognostic factorsviz., age, sex, CD4 cell count, past smokers, mode of transmission, baseline hemoglobin and baseline BMI are statistically significant (p<0.000). Most of the previous studies have suggested that age is a significant prognostic factor (Saah et al., 1994;Bachani et al., 2010;Ghate et al., 2011;May et al., 2010;Kee et al., 2009). A younger person undergoing ART is more likely to survive longer as compared to an older person, i.e., old age is associated with high risk of disease progression but in our analysis age is not found to be a significant prognostic factor. Also females are observed to have better survival than their male counterpart. As reported previously females had higher life expectancy than males, ATCC, (2008;Farzadegan et al., 1998;Donnelly et al., 2005;Stringer et al., 2006;Mageda et al., 2012). Remafedi and Lauer (1995), have found that sex of the patient does not have any significant effect on survival time.
Consistent with the published literature CD4 cell count is found to be an important prognostic marker of HIV/AIDS patients. Patients with CD4 cell <200 cells/mm 3 have 3.31 times more hazard to die than patient who had CD4 cell more than 200 cells/mm 3 . The mortality is inversely proportional to CD4 count, cumulative probability of AIDS and death increased substantially with decreasing CD4 cell count Ghate et al. (2011;ATCC, 2009;May et al., 2007;Rai et al., 2013;Rajasekaran et al., 2009;Kee et al., 2009). It emphasizes the importance of creating awareness about early diagnosis so that the eligible patients can be initiated on ART earlier.
Another important finding of our study is that patients with sexual (hetro or homo) mode of transmission had worst survival than patients with blood and intravenous drug user mode of transmission, which is departed from earlier result of Kumarasamy et al. (2003), where they found that intravenous drug user mode of infection had worst survival. However, Remafedi and Lauer (1995) have shown that there were no significant differences between deceased and other subjects in relation to mode of transmission. Unknown mode of transmission was found to be a significant factor in weibull, Lognormal and loglogistic AFT Model result.
Patients who are current smokers had associated with more hazard of immune deterioration. We found that baseline hemoglobin was a significant predictor for HIV/AIDS patients. To our knowledge hemoglobin has never shown to predict mortality of patients on ART in India, further studies are needed to confirm our findings. Baseline hemoglobin level can be used as a simple and practical tool for initial risk assessment in the absence of CD4 cell count and viral load, as was identified in earlier studies by Johannessen et al. (2008) in Tanzania, Jerene et al. (2006) in Ethopia and Mocroft et al. (1999) in Europe.
BMI is positively associated with survival, this is corroborating to the findings of Bachani et al. (2010), that patients improved clinically with regard to weight and hemoglobin. The studies from limited setting also reported similar results (Jerene et al., 2006). Van der Sande et al. (2004) had advocated that BMI at the time of HIV diagnosis is a strong and independent predictor of survival. Unlike other studies alcohol is not found to be a significant predictor in any of the models in our analysis.
In this study we have not considered some important factors like adherence to ART, presence of opportunistic infections including TB, HIV staging and ART initial regimen etc. that could potentially affect the survival. Moreover, we have used only one ART centre data for analysis that may not be representative for the whole country. Considering the diversity of India, our results need to be substantiated by similar survival studies from other parts of India to conjure up a comprehensive picture of HIV/AIDS epidemiology in India.
Despite these limitations, our findings have policy implications; early initiation of ART is needed for patients with lower CD4 cell counts, Additional efforts and counseling are needed for old aged, low BMI and low hemoglobin patients. Parametric lognormal AFT model should be employed to identify the independent predictors for HIV populations.