Toward a Hierarchical Bayesian Framework for Modelling the Effect of Regional Diversity on Household Expenditure

Problem statement: Household expenditure analysis was highly demandin g for government in order to formulate its policy. Since household data was viewed as hierarchical structure with household nested in its regional residence whi ch varies inter region, the contextual welfare analysis was needed. This study proposed to develop a hierarchical model for estimating household expenditure in an attempt to measure the effect of regional diversity by taking into account district characteristics and household attributes using a Ba yesi n approach. Approach: Due to the variation of household expenditure data which was captured by th e three parameters of Log-Normal (LN3) distribution, the model was developed based on LN3 distribution. Data used in this study was household expenditure data in Central Java, Indones ia. Since, data were unbalanced and hierarchical models using a classical approach work well for bal anced data, thus the estimation process was done by using Bayesian method with MCMC and Gibbs sampli ng. Results: The hierarchical Bayesian model based on LN3 distribution could be implemente d o explain the variation of household expenditure using district characteristics and hous ehold attributes. Conclusion: The model shows that districts characteristics which include demographic and economic conditions of districts and the availability of public facilities which are strongl y associated with a dimension of human development index, i.e., economic, education and he alt , do affect to household expenditure through its household attributes.


INTRODUCTION
Regional income distribution can determine the ability of the region in creating change and improvement of its people, such as reducing poverty. It is noted that inequality of regional income distribution will not create wealth for society in general, but only creates wealth for certain groups. According to BPS (2010b), inequality of income distribution can be viewed from three sides. First, the relative inequality i.e., size distribution of income disparities. Second, rural-urban income disparities which are usually caused by more development-oriented to urban areas. This Urban bias development often occurs in developing countries such as Indonesia. Third is the regional income disparity, which is generally viewed in Indonesia because of the economic development disparities between regions and inequality in the distribution of natural resources between region.
Basically the factors that affect the welfare problems can be broadly categorized into two main things. Those are behavior paradigms and policy paradigms (Akita and Pirmansyah, 2011). Behavioral paradigms related to the effort of responsibilities of each individual or household in achieving their welfare levels. In each household, there are specific factors that potentially contribute to the paradigm of such behavior. While the policy paradigms associated with economic conditions, politics and government policy. In addition, non-household factors may also affect the difference in the level of welfare. An example is community-level factors such as geography and availability of public facilities (economic, education and health facilities).
Income per capita is an economic indicator that is often used for measuring the prosperity and wellbeing. Analysis of household income is essential in order to formulate government policy. However, household income is generally very difficult to be measured accurately, especially in developing countries. Basically, household income and household expenditure are not the same thing. But such relationships between those two are very strong. Akita and Pirmansyah (2011) states that consumption expenditure is more reliable than income as an indicator of a household permanent income because it does not vary as much as income in the short term. For those reasons, household expenditure patterns approach is then widely used to analyze the pattern of household income. Indonesia has changed its governance systems for centralized into a decentralized system since 1999. Consequently, the achievement of local government will be largely determined by the active and innovative role of local government in determining its local policy in order to achieve prosperity and welfare of its residents. Since the Indonesian area is vast and the regional conditions vary with each other, the contextual welfare analysis needed by taking into account the regional diversity in order to formulate government policy. Shahateet (2006) shows that there is regional effect of income inequality.
Central Java is one of the provinces on Java Island in Indonesia. It is known as the heart of Javanese culture because the culture of Central Java is diverse and includes a variety of cultures from another province in Java. The total area of Central Java is 32,800.69 km2, or approximately 25.34% of the total Java Island (BPS, 2010a). Its poverty rate was about 16.6% of its population in 2010 (BPS, 2010a). That number is higher than average percentage of poor people of Indonesia (13.3%) (BPS, 2010a). In 2011, the local government shows the success in declining the percentage of poor people in Central Java to around 15.76% (BPS, 2011).
Administratively, the province of Central Java is divided into 35 districts consisting of 29 regencies and 6 cities. The differences regarding the household expenditure level in the Central Java inter-district can be seen in Fig. 1. This Fig. 1 shows that the mean of household expenditure varies between districts and districts in urban areas have a higher household expenditure mean than rural areas.
Household expenditure distribution has a shape that close to a right skewed distribution such as lognormal. Battistin et al. (2007) state that Log-Normal distribution provides a useful theoretical model for studying certain economic population such income and expenditure distributions. Two parameter log-normal distribution however, is insufficient to capture the variation in the empirical distribution of household data in Central Java. The three parameter Log-Normal distribution (LN3) therefore is applied to explain the variation of the data. The probability density function for LN3 is specified as follows: where: µ>0 = The location parameter τ>0 = The scale parameter and -∞<λ>∞ = The threshold parameter It is shown in Eq. 1 that LN3 has additional parameter, i.e., threshold parameter that shifts the whole of its distribution curve above zero. This characteristic represents the expenditure data which never has zero value.
Since household data is nested in its regional residence, it is classified as hierarchical platform. In this case, household expenditure can be influenced by factors from several different levels, i.e., factors at the household level and factors at the regional level.
Hierarchical models are formulated for analyzing data with complex sources of variation (Raudenbush and Bryk, 2002). Cases with complex sources of variation are frequently referred to the hierarchical structure of data (Goldstein, 1995;and Hox, 1995). Hierarchical data structure viewed data to be classified as a multilevel structure. Standard unilevel methods are not appropriate for analyzing such of hierarchical system (Maas and Hox, 2004), due to the parameter estimates are inefficient and standard error is negatively biased (Hox, 1995;Maas and Hox, 2004). Raudenbush and Bryk (2002), Goldstein (1995) and Hox (1995) proposed hierarchical models for overcoming this kind of several different levels of hierarchical data modeling into a single statistical analysis. It is noted that hierarchical models, mostly use a classical approach in the estimation process. In the case of the complex hierarchical models, however, parameter estimation using the classical approach would be very difficult to be derived. Raudenbush and Bryk (2002) demonstrate that a hierarchical model using a classical approach works well when the data is balanced and the number of higher level unit is large. In some applications, however, this condition will not be easily hold.

MATERIALS AND METHODS
Residential conditions and facilities are frequently used as visual indicators to judge the level of socioeconomic welfare of the household. A number of studies, which have been done, show that several household attributes affect household expenditure, i.e., household size, education level of household head, house area, types of wall, type of floor, source of drinking water, kitchen, toilet facilities and electricity Iriawan and Ismartini, 2011;Haughton and Nguyen, 2010;Mok et al., 2007;and Grosh and Baker, 1995). This study will use predictors based on those previous studies, called micro variables and other predictors, called macro variables, that are investigated to influence household expenditure. Public service facilities are the example of macro variables. Since the availability of those facilities illustrates concrete steps of the local government policies in enlarging the person's welfare. The sample coverage area of data used in this study is a Central Java Province.
Preliminary analysis of the data is shown in Fig. 2 which demonstrates the pattern of simple regression lines for five districts in Central Java that have difference in both of their slopes and intercepts. This fact indicates that there are variations on district level or the presence of regional influence in which hierarchical analysis should be employed for analyzing this problem. This study proposes to model community characteristics and household attributes on household expenditure on Central Java Province, Indonesia, using a hierarchical Bayesian model based on the three parameter log-normal distribution.

Data descriptions:
This study relies extensively on household expenditure data collected by the National Socioeconomic Surveys (Susenas) which have been conducted regularly by Statistics Indonesia (BPS). The dependent variable used in the model is household expenditure per capita (y). There are several household attributes as micro variables (X) and district characteristics as macro variables (W) that are considered as having affected the household expenditure per capita. Those variables are a type of house wall (X 1 ), type of house floor (X 2 ), floor area per capita (X 3 ), type of sources of drinking water (X 4 ), toilet facilities usage (X 5 ). Type of cooking fuel (X 6 ), household size (X 7 ), the level of household head education (X 8 ), whether the head of household working in agriculture (X 9 ), population density (W 1 ), ratio of primary school to primary school age children (W 2 ). The ratio of junior high school to junior high school age children (W 3 ), ratio of senior high school to senior high school age children (W 4 ), number of health facilities (W 5 ), number of medical personnel (W 6 ). The percentages of villages having public phone (W 7 ), a number of cooperative, that is an establishment that its members are people or establishments with the legal status of the cooperative and its activities based on peoples' economic movements (W 8 ). The number of large and medium enterprise (W 9 ), number of small/household industry (W 10 ), gross regional domestic product at current price per capita (W 11 ) and percentage contribution of revenue to budget revenue (W 12 ).

Fig. 2: Simple regression lines for five districts in central Java
Log-normal hierarchical models: A hierarchical model is formed by two sub-models, i.e., micro models (the models at a lower level) and macro models (models at higher levels) (Goldstein, 1995). For the two level hierarchical models of household expenditure in Central Java, the micro model investigates the association between household expenditure and various household attributes, while the macro model examines the relation among coefficients of micro model and district characteristics.
Suppose N is the number of households which is sampled from m districts and n j is the number of households which is sampled in j th districts, so . Suppose y j as a response in micro model and x j as micro variables where j = 1,2,..m. y j is n j ×1 vector and X j is n j × p a matrix where p = k+1 and k represent a number of micro variables. Since , the micro models based on Log-Normal distribution is specified as follows (Stata, 2009) where, j j ¢ y = ln(y ) , r j is the residual vector of micro models and j j ¢ r = ln(r ) . β j is p×1 coefficient vector of micro models. The macro models are, therefore can be specified as follows: where, W j is p×q the matrix of macro variables with q = l+1 and l represent a number of macro variables, γ is the coefficient vector of macro models and u j is the residual vector of macro models. The single equation models for Eq. 2 and 3 can be specified as follows: Refer to Eq. 2, 3 and 4, the two level hierarchical Bayesian models for household expenditure in Central Java are defined as follows: 9 ij 0j kj kij ij j k=1 ¢ ¢ y = β + β X + r ; i = 1,2,...,n , j = 1,2,...,35 ∑ 12 pj p0 pl lj pj l=1 β = γ + γ W + u ; p = 0,1,2,...,9, l = 1, 2,...,12 ∑ Bayesian inference: Consider Bayes' Theorem (Box and Tiao, 1992;Gelman and Hill, 2007): where, θ and z are both random, θ is parameter vector and z denotes vector of observations from the sample. p(z) is defined as normalized constant with respect to θ. Then, the posterior can be represented as a proportional form as follows: It is shown in Eq. 8, the posterior is proportional to the combination of prior information and current information of data. All information about the unknown parameter of interest is included in their joint posterior distribution. Based on Eq. 7, the joint posterior distribution of the two level hierarchical models for household expenditure can be expressed as: [y] [β] f(y | β, λ, τ )p (β | γ, τ ) p (γ, λ, τ , τ ) p(β, γ, λ, τ , τ y) = p(y) | , With:  p (β | γ, τ ) is a first stage prior for random parameters and p (γ, λ, τ , τ ) 2 [y] [β] is a second stage prior or hyper prior for hyper parameter. Eq. 10 is a proportional form of posterior for two level hierarchical model.
In Bayesian inference, all parameters need prior distribution. The nature of proposed prior distributions in this study is treated as independent prior distributions (Box and Tiao, 1992;Carlin and Chib, 1995) which are comprised combination of conjugate and informative prior distributions and pseudo prior.
Inference about the subset of focal parameters of interest is derived using its marginal conditional distribution. The marginal conditional distribution is calculated by integrating Eq. 10 with respect to auxiliary unknown parameters, which tend to complex numerical integration. To overcome that problem, Bayesian method is taking repeated samples from the full conditional posterior distribution using MCMC and Gibbs Sampling (Gelman et al., 2004;Gelman and Hill, 2007;Ntzoufras, 2009).
The estimation of parameters of interest is implemented in WinBUGS 1.4 as a computational power of recent software for Bayesian computation.  The concept of that iterative estimation process is generated by Winbugs derived from Directed Acyclic Graph (DAG) of the hierarchical model. Figure 3 shows DAG of two level hierarchical Bayesian model for household expenditure in Central Java as the implementation of Eq. 5 and 6.

DISCUSSION
The two hierarchical Bayesian model shows that household welfare levels in Central Java, generally, can be indicated by several household attributes. First, the household welfare can be specified from housing condition such as, a good type of wall and floor and size of floor area per capita. Second, in majority, the welfare can also be identified by the availability of daily needs facilities such as, clean water sources, toilet ownership and a good cooking fuel. Third, Human capital of household for instance, the number of people in the household and level of education of household head affect the household welfare as well. The fact in 18 districts shows that household which generally economically active in agriculture sector has lower welfare level than others. According to BPS (2010c), those 18 districts mainly have a high percentage of wetland area and poverty level compare to other districts. For example, Brebes has almost 37.73% of its area is dominated by wetland and its percentage of poor people stands the fifth highest percentage among districts in Central Java (24.39%).
District characteristics do affect positively to household welfare through the specific household attributes.
Those districts characteristics are demographic and economic conditions of districts and the availability of public facilities, i.e., Economic, education and health which are strongly associated with a dimension of human development index. This relation shows that better availability of those public facilities yields higher welfare of the people. In terms of the economic dimension, number of small/household industry has also a positive effect on household welfare. This is reasonable since industry can create job opportunities for the people therein.

CONCLUSSION
This study has already demonstrated the work of the developed model for estimating household expenditure in order to measure the effect of regional diversity by taking into account district characteristics and household attributes using a hierarchical Bayesian approach based on the three parameters of the log-normal distribution. The result shows that the regional diversities do affect the household expenditure therein. The local government effort in providing public facilities statistically can improve its people welfare. Other interesting future research perspective is to investigate other specific district characteristics and household attributes that might affect household expenditure.