GENERALIZED LINEAR MIXED MODELS WITH SPATIAL RANDOM EFFECTS FOR SPATIO-TEMPORAL DATA: AN APPLICATION TO DENGUE FEVER MAPPING

The Generalized Linear Mixed Models (GLMMs) with sp atial random effects for spatio-temporal data are proposed. A hierarchical Bayesian method is used fo r parameter estimation. The random effects are assumed to be normally distributed and the spatial random effects are assumed to be proper Conditional Autoregressive (CAR) models. The proposed models ar e applied to Dengue fever data in Northern Thailand, including climatic covariates, rainfall a nd temperature. The Dengue fever maps are construct ed from the posterior mean of the mortality rates.


INTRODUCTION
Spatio-temporal data are data collected across both time and space. Thus the data analysis has to take into account the spatial correlation across the areas and temporal correlation within each area. In Thailand, there are annual data reports of common infectious diseases from the Ministry of Public Health (MOPH) of Thailand every year. In the reports, raw data are presented and descriptive statistics, such as rates, percentages and bar charts are usually used to describe the features of those data (MOPH, 2011). It will be more informative and easier for the readers to understand if those data are analyzed thoroughly and presented in maps which are socalled disease maps (Lawson, 2008). Those data motivated us to investigate the models for the spatiotemporal data for disease mapping.
The spatio-temporal models for disease mapping found in prominent papers (Bernardinelli et al., 1995;Waller et al., 1997;Xia and Calin, 1998;Sun et al., 2000;Knorr-Held and Besag 1998;Nobre et al., 2005;Martinez-Beneito et al., 2008) are based on linear predictors, which may have all terms or a subset of them, expressed as Equation (1): where, η ijk , i = 1,..,I; j = 1,…,J; k = 1,…,k are linear predictors and β is a p×1 vector of covariates X ijk . C k , S i , T j are additional risks of belonging to group k, living in area i and period j. CS ik , CT ik , ST ij and CST ijk are interaction terms. The ε ijk are error terms. The observed data, Y ijk , typically, are assumed to be Poisson distributed. For example, assuming that Y ijk are Poisson distributed, Waller et al. (1997) propose the linear predictors as η ijk = C k +S i and S i are broken to be the sum of a heterogeneity, u i and a spatial effect, v i , i.e., S i = u i +v i . u i are assumed to be normal distributed and v i are assumed to be Conditional Autoregressive (CAR) models. Generalized Linear Mixed Models (GLMMs) have grown in popularity due to their ability to model Science Publications JMSS different types of data including spatio-temporal data (Diggle et al., 2002;McCulloch et al., 2008). In GLMMs, specifically, let Y ij , j = 1,…,n i be marginally correlated observations from area i = 1,…,m. The GLMMs assume that conditional on random effects b i , Y ij , j = 1m,…,n i , are independent and follow a distribution from the exponential family with density f(Y ij |η ij , ϕ) = exp {ϕ −1 [Y ij η ij , -ψ(η ij )]+c(Y ij , ϕ)} where ψ(.) and c(.) are known functions, η ij are natural parameters and ϕ is a scale parameter, E(Y ij |b i ) = u ij = ψ`(η ij ) and Var(Y ij |b i ) = ϕψ``(η ij ) = ϕψ``(ψ`− 1 (µ ij )) = ϕv(µ ij ); thus the variances are related to the means through the variance function v(.) = ψ``(ψ`− 1 (µ ij )). The conditional mean µ ij = E(Y ij |b i ) can be modeled as Equation (2): With the canonical link function, Equation (2) can be written as Equation (3): where, β is a p×1 vector of fixed effects, for population, related to covariates X ij and b i is a q×1 vector of random effects, for each subject, related to covariates Z it .
Typically, iid i q b~N (0, D) is assumed. The correlation of two observations related to time is expresses as Equation (4): ). The most common distributions from this family are Binomial, Poisson and Normal which are associated to logit, log and identity canonical link function respectively.
The spatial correlation in GLMMs can be accounted by modifying the random effect part to include the spatial random effects, v i , for area i, i = 1,…,m, in Equation (3). Thus the GLMMs with spatial random effects are expressed as Equation (5): The common way, in disease mapping, is to assign a Conditional Autoregressive (CAR) model first introduced by Besag (1974) to the spatial random effects which are not observed. The CAR model is defined by the conditional probability density function, Since the joint distribution of v i is improper, a remedy is to introduce a spatial parameter. A suitable constrained, this parameter ensures a proper joint distribution for the CAR model (Gelfand and Vounatsou, 2003). The modified version is so-called a proper CAR model and where W is the adjacency matrix with entries w i1 = 1 if area i and 1 are neighbors and w i1 = 0 otherwise, with the diagonal entries w ii = 0 and i+ il l w = w ∑ , τ v is a conditional variance and its magnitude determine the amount of spatial variation and ρ is the spatial parameter. The most commonly used spatial effects is based on some form of a Conditional Autoregressive (CAR) structure (Clayton and Kaldor, 1987;Cressie and Chan, 1989;Besag et al., 1991;Bernardinelli et al., 1995;Waller et al., 1997;Pascutto et al., 2000). There are several methods including a hierarchical Bayesian method used for parameter estimation in GLMMs. Moreover, a hierarchical Bayesian method has been extensively used for CAR models. Thus, in this study, the GLMMs, proper CAR models and hierarchical Bayesian method are adopted for modeling spatio-temporal data in order to construct disease maps.
As mentioned earlier, the infectious disease data motivated us to investigate the models for the spatiotemporal data for disease mapping. Dengue fever, an infectious disease caused by a family of viruses that are transmitted by mosquitoes is chosen to study. It is one of major public health problems in Thailand. In 2011, a total of 69,800 cases and 63 fatalities were reported from all provinces. The morbidity rate was 111.10 per 100,000 people. The Case Fatality Rate (CFR) was 0.0903% (MOPH, 2011). The disease maps are needed as they are communication tools to the general public about which geographic areas and people are at high risk for the Dengue fever.
In this study, we propose GLMMs with proper CAR spatial random effects to analyze Dengue data associated with rainfall and temperature in order to construct Science Publications JMSS disease maps. The proposed models are specific cases of Equation (1). The random effects are assumed to be normally distributed and the spatial random effects are assumed to be proper CAR models. The observations are assumed to be Poisson distributed. The proposed models are applied to Dengue fever data in Northern Thailand.
The subsequent sections are as follows. Firstly, for materials and methods, GLMMs with proper CAR spatial random effects are described and an application to Dengue fever data is illustrated. Then the results of data analysis are presented. Finally, the discussion and conclusion are conducted respectively.

MATERIALS AND METHODS
The observed data, Y ij , the numbers of patients in area i, I = 1,…,m, at time j, j = 1,…,n i , are assumed to be Poisson distributed with a natural log link function. The proposed models based GLMMs and proper CAR models are expressed as Equation (6): For each area, Equation (6) can be written as η I = β+Z i b i +1v i where µ = E(Y i |b i, v i ). The sizes of vectors or matrices are: η i (n i ×1), Y i (n i ×1), X i (n i ×p), β (p×1), Z i (n i ×q), b i (q×1) and 1 (n i ×1). For all areas, Equation (6) can be written as η = Xβ+Zb+Sv where µ = E(Y|b,v). . Without loss of generality, V can be reparameterized by including a sum to zero constraint on v i . Then we get V∼car.proper(0,τ v (D w -ρW) −1 ). Under a hierarchical Bayesian framework, the specification of prior distributions, in particular for variance components, is required. In the absence of subjective prior information, a prior on each βis assumed to be normally distributed with zero mean and large variance, uniform (0,1) for ρ and inverse Wishart for D. When D is univariate, the inverse Wishart reduces to inverse gamma. A posterior inference can be easily implemented using standard Gibbs sampling Markov Chain Monte Carlo (MCMC).
For an application, the Dengue data in 2011 consisted of 68 observations from 17 provinces in Northern Thailand (MOPH, 2011). Each province contributed 4 quarterly observations over time. Let Y ij , j = 1,…,4, i = 1,…,17 denote the numbers of Dengue fever patients in province i at quarter j. In stead of modeling counts, offset variables are used for modeling rates which are cases per population sizes. The model can be expressed as Equation (7): where, log(pop i ) are offset variables, pop i are the numbers of population, rain ij are total rainfall, temp ij are temperature b i are random intercepts capturing geographically unstructured heterogeneity among provinces and v i are proper CAR spatial random effects capturing spatial dependence among provinces. Bayesian inference is used to fit the model by assuming a N(0,τ b ) for b i , a proper CAR model for the joint distribution of v i , i.e., V∼car.proper(0,τ v (D w -ρW) −1 ) and a uniform (0,1) for ρ. We use independent N(0m,10 6 ) priors for all fixed effect parameters.
τ v ∼invGamma (0.001,0.001) andτ b ∼invGamma (0.001,0.001) The MCMC Gibbs sampling was run via Open BUGS. The MCMC convergence is checked by the value of the potential scale reduction of Gelman and Rubin (1992) (Rhat) and visual examination of trace, autocorrelation, history and density plots. The Rhat values substantially closed to 1 indicate convergence.

RESULTS
The estimated regression coefficients and variance components are presented in Table 1, from MCMC runs of 30,000 iterations after discarding the initial 10,000 iterations as burn-in. The value of Rhat closed to 1 for every parameter in Table 1 indicates that the MCMC is converged. The result shows that the morbidity risk or Relative Risk (RR) of Dengue fever will be increased when the amount of rainfall increases and also will be increased when the temperature increases. Since the spatial parameter ρ is not closed to zero, there is a spatial association among provinces. However, according to τ v , the variation among areas is quite small. The high risk provinces and Quarters (Q) are shown in Table 2.
The Dengue fever maps for Q1 to Q4 of the provinces in Northern Thailand constructed from the posterior mean of the morbidity rates are shown in Fig.  1-4. It is clearly seen that the highest risk are in Q3.

DISCUSSION
The proposed models are based on GLMMs and proper CAR models. A hierarchical Bayesian method is used for parameter estimation. They are different from the model of Waller et al. (1997) in the way that the proper CAR models are adopted, instead of improper CAR models. Moreover, the proposed models allow random factors to be included and inverse Wishart would be applied for their variance component. The results of the analysis confirm that the Dengue fever in Northern Thailand were significantly associated with the rain and temperature. The Dengue fever maps clearly show which areas are at high risk in each quarter. They are valuable public health tools to identify the areas where an intervention is required for the Dengue fever control.

CONCLUSION
The GLMMs with proper CAR spatial random effects under a hierarchical Bayesian framework for spatio-temporal data are proposed. The proposed models are applied to Dengue fever data in Northern Thailand. The covariates considered are rain and temperature. The Dengue fever maps which are important tools for Dengue fever control are produced using the posterior mean of the mortality rates.

ACKNOWLEDGEMENT
We gratefully thank Assoc. Prof. Dr. Yisheng Li for his valuable advice and Mr. Allen Doyle for his kind proofreading. We also thank the Department of Statistics and the Faculty of Science of Kasetsart University for technical support.