Quadratic Discriminant Analysis of Dengue Viruses Disease Incidence in Palembang

1Jurusan Matematika, Fakultas Matematika dan Ilmu PengetahuanAlam, Universitas Sriwijaya, Jl. Raya Palembang-Prabumulih Km.32 Inderalaya 3066, Sumatera Selatan, Indonesia 2Pusat Pengajian Sains Matematik, Fakulti Sains dan Teknologi, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor Darul Ehsan, Malaysia 3Jurusan Biologi, Fakultas Matematika dan Ilmu PengetahuanAlam, Universitas Sriwijaya, Jl. Raya Palembang-Prabumulih Km.32 Inderalaya 3066, Sumatera Selatan, Indonesia


Introduction
Palembang is the capital of South Sumatra province which has an area of 400.61 km 2 (CBS, 2014). The region is divided by the Musi river into two areas namely Seberang Ilir (245.97 km 2 ) and Seberang Ulu (154.64 km 2 ). In topography, the two regions are different, Seberang Ulu generally has a relatively flat topography, while Seberang Ilir has a topography that varies between 4-20 m above sea level. Palembang has a tropical climate with a relative humidity wind, the temperature of 23.4-31.7°C and the average rainfall of 227.23 mm per year (BMKG, 2008). The tropical climate conditions of Palembang has the potential of breeding the Aedes Aegypti mosquito which is the cause of dengue virus disease.
According to the Indonesian Ministry of Health 2011 in Ismah (2014), in addition to the tropical climate, another factor that has the potential of breeding the Aedes Aegypti mosquito are the economic condition, type of housing, population densities, hygienic behavior of population and health management. Salamah et al. (2012) explored the effects of climatic and socioeconomics factors towards the number of dengue virus disease in 13 districts of East Java. They employed the semi parametric panel regression approach and compared this approach with the standard panel regression. Their results showed that the number of dengue virus disease incidences is significantly influenced by the per-capita income and the number of people below 15 years. They also noted that the dengue virus disease incidence in East Java is optimum under the humidity of 82-87%, the temperature of 22-27°C and the rainfall of 1500 mm 3670 mm, with Surabaya as the most responsive district with respect to the change of climate variables. Hii (2013) revealed that the dengue virus disease incidence in Singapore (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) is significantly affected by the mean temperature and the cumulative rainfall. Toan (2015) identified the factors of dengue virus diseases incidence in Hanoi based on the demographic characteristics of patients. She employed the logistic regression and found that living near the open sewers increased the risk of incidence at 7.9 times, which is the highest risk compared to other associated factors.
The spreading area of dengue virus diseases incidence in Palembang can be divided into five areas; Center, North, West, East in the Seberang Ilir and South in Seberang Ulu. Ismah (2014) showed that in years 2009-2013, the distribution of dengue virus disease incidence in Palembang was mostly located in the North and Central areas, where both areas have a high population density, a high percentage of healthy home, a low percentage of poor people and a low Larvae Free Index (LFI). The survey results showed that the majority of population in both areas have a lot of water reservoirs.
In this study, we explore the associated factors that influence the dengue virus diseases incidence in every area in Palembang using the quadratic discriminant analysis. In the economic and financial research, the application of discriminant analysis has been reviewed by Siqueira et al. (2012) who selected and evaluated the discriminate factors between the most profitable and less profitable stocks. They investigated the fundamental and the financial variables and the beta coefficient of CAPM of stocks traded on Sao Paulo Stock Exchange (BM and FBovespa) in 2006-2010. Li and Shao (2015) also proposed the discriminant analysis, especially quadratic, for the sparse high dimensional data to classify the tumor colon tissues. Arasi et al. (2016) applied the discriminant analysis to analyze the factors that influence the success of web for online business service. Their research found that the factors of availability and compliance in e-service were significantly different.

Data Collection and Description
We collected the characteristics of dengue virus diseases incidence for the five areas in Palembang using a survey data. The survey population comprised of housewives who had family members suffering from dengue virus disease in year 2015. According to PKSS (2015), the number of dengue virus disease incidence in Palembang in 2015 was 979 people. The respondents has lived for at least one year in these areas. The stratified sampling technique was used to find the respondents from each area, where the number of samples is determined by the Slovin formula. The percentage of incidence of dengue virus disease that spread in each area is presented in Table 1. The questionnaire of the survey consisted of five parts. The first part consisted of information on the family members suffering from dengue virus disease, such as age, gender and blood group of patient. The second part consisted of information on the respondent and her husband, such as age, educational level, occupation and income per month. The third part explored the respondent's knowledge on dengue virus disease which was assessed based on 21 questions. The answers' options consist of 'true' or 'false', whereas the questions were focused on dengue virus disease transmission and vector and clinical manifestations. The fourth part explored the behavior and practices of prevention towards the incidence of dengue virus disease, such as the use of mosquito repellent, mosquito nets, mosquito wiring and fogging; the management of used containers, the keeping of fish-eating mosquito larvae and the cropping of mosquito repellent. This behavior was assessed based on 17 questions, where the answer options consist of 'always', 'sometimes' or 'never'. The fifth part consisted of information on types of residence, number of bedrooms, number of bathrooms that have tubs and clean water source.

Quadratic Discriminant Analysis
Consider X = (X 1 , X 2 ,…, X r ,… X 17 ) T as the vector of independent variable or characteristic for the incidence of dengue virus disease. Let Y i be the i-th area of dengue virus disease incidence, i = 1,2,…,j,…,k and each of Y i has n i sample, so that the total number of samples is 1 k i i n n = = ∑ . Let S i be the covariance matrix and p i be the prior probability of the i-th area respectively. The purpose of quadratic discriminant analysis in this study is to map n samples into 5 areas based on the quadratic discriminant score Z i . A respondent who has a family member suffering from dengue virus disease is allocated to area Y j if it has the largest quadratic discriminant score compared to other areas (the smallest quadratic distance). The quadratic discriminant score of the i-th area is defined as: This score function was under the assumption of unequal multivariate normal distributions among groups. In this study, we considered 17 categorical variables in mapping the n samples into the 5 areas, namely age (X 1 ), gender (X 2 ) and blood group (X 3 ) of the patient; age (X 4 ), education (X 5 ), occupation (X 6 ), income per month (X 7 ), knowledge of dengue virus disease (X 8 ) and behavior about practices of prevention towards the incidence of dengue virus disease (X 9 ) of the respondent; age (X 10 ), education (X 11 ), occupation (X 12 ) and income per month (X 13 ) of the head of family; type of residence (X 14 ); number of bedrooms (X 15 ), number of bathrooms that have tubs (X 16 ) and clean water source (X 17 ). Variables X 2 and X 8 have 2 categories, variables X 1 and X 9 have 3 categories, variable X 3 has 4 categories, while other variables have 5 categories. The categorical variables were transformed into continuous variable using the Successive Interval method.

Result and Discussion
We employed several tests to explore the assumptions of quadratic discriminant score. The result of Box-M test (α = 5%) showed that the covariance matrices of North, South, Center, West and East areas were not homogen (Box-M test = 1802.15 > F (α; df ) = 670.66). The result of the test of equality of means for the areas using the Wilk's lambda and Fas given in Table  2 showed that gender and blood type of family member who suffered from dengue (X 2 and X 3 ), age, education, occupation, income and knowledge of respondent (X 4 , X 5 , X 6 , X 7 and X 8 ), occupation and income of family head (X 11 and X 13 ), type of residence, number of bathrooms that have tubs and clean water source (X 14 , X 16 and X 17 ) were relatively different in each area (for Wilk's lambda test, its value is close to one and for F test, p-value < α).
The dengue virus diseases incidence in the North, South, Central, West or East areas were affected by all factors, except the patient's age (X 1 ), the behavior of practices of prevention towards the incidence of dengue virus disease of the respondent (X 9 ), the age and occupation of the family head (X 10 and X 12 ) and the number of bedrooms (X 15 ).
The results of the Box-M test and the equality of means test were also supported by the plot Mahalanobis Distance with Chi-Square as shown in Fig. 1. The plots were not a straight line, indicating that the covariance matrix is not homogeneous and the mapping of incidence of dengue virus can be constructed using different covariance matrix as shown in Equation 1.
The quadratic discriminant score for the South, Center, North, West and East areas, together with the largest score in each area are shown in Table 3. Table 4 provided the successive number of respondents which were mapped into the 5 areas (South, Central, North, West and East) based on the largest quadratic discriminat score, Z i . The results showed that as many as 30.2, 66.1, 66.7, 44.4 and 88.0% of respondents were successively mapped into each area, so that the overall correct percentage of the mapping is 66.7%. For others, as many as 69.8, 33.9, 33.3, 55.6 and 11.1% of respondents were successively mapped to other areas. The results showed that this model is sufficient for mapping the incidence of dengue fever into five areas in Palembang based on the correct percentage of respondents mapping.
One of the main benefits of mapping the incidence of dengue virus disease is that the results can provide the characteristics that significantly affect the incidence of dengue in each area and the information can be used by the government in the management of prevention and control of disease.

Conclusion
This paper has applied the quadratic discriminant analysis approach in mapping the dengue viruses disease into 5 areas of Palembang; South, Central, North, West and East. The results showed that the mapping of dengue virus disease incidence in each area is significantly affected by all factors, except age of family member suffering from dengue viruses disease, respondent's hygienic behavior, age and occupation of family head and number of bedrooms. The gender and blood group of patient; age, education, occupation, income and knowledge of respondent; occupation and income of family head; type of residence, number of bathrooms that have tubs; and clean water source are relatively different in each area based on the results of the Wilk's lambda test and the F test. The overall correct percentage of the mapping results is 66.7%. This percentage, which is above 50%, indicates that the proposed model is sufficient for mapping the incidence of dengue fever into five areas in Palembang.