USING COLUMN GENERATION TECHNIQUE TO ESTIMATE PROBABILITY STATISTICS IN TRANSITION MATRIX OF LARGE SCALE MARKOV CHAIN WITH LEAST ABSOLUTE DEVIATION CRITERIA

The Least Absolute Deviation (LAD) method is the one of many methods used to estimate transition probability matrix of Markov Chain. It can be formulated as a Linear Programming Problem (LP) and solved using its regular state of the art software. However, when the Markov Chain has a large number of states and historical state probabilities data, the corresponding LP size can be very large reaching computer hardware/software limitation. The aim of this study is to apply the Column Generation (CG) technique to solve this large scale LP and to evaluate its extension beyond direct hardware/software capabilities. In this study, the sample state probabilities data were simulated statistically and two methods were used to solve the problem. The first method was using ‘linprog’ function in MATLAB to solve the related LP that all decision variables were considered simultaneously while the others was the CG method expected to require a much less percentage of all variables. As result effectiveness, both methods solved all test problems resulting equal LADs each. The CG method required more average time. Nevertheless, less than 30% of decision variables were considered in the CG method. The lesser percentages were found as the problem size grew. Moreover, larger size problems beyond direct use of software were solved using the proposed approach.


INTRODUCTION
Markov chain is widely used as model in many areas i.e., economics, marketing, capital theory, industrial structure, demography and social science which shown in the review of Dent and Ballintine (1971). Later, the Markov models have also received attention from researchers in various disciplines such as Craig and Sendi (2002) used Markov chain to study the changes of chronic diseases. Jones (2005;Christodoulakis, 2006;Simister, 2007) estimated the transition matrix of Markov chain to forecast the credit risk. Thyagarajan and Mohamed (2005) used Markov chain to analyze the retail banking. Lin (2011) applied Markov chain to establish an adaptive production procurement system. Constant and Zimmermann (2013) studied the migration of the population with Markov chain. Markov model is used to describe the chance of an event occurring in the next period when we know the chance of that event in the present period and transition probability matrix in the following manner:

Pe e =
(2) where, e is the n-element column vector of unit, n is the number of states. There are many researchers who proposed the methods to estimate probability statistics in transition matrices. Lee et al. (1968) suggested the Maximum Likelihood and Bayesian estimation. Madansky (1959;Kalbfleisch and Lawless, 1984) constructed the estimators by the Weighted Least Square method. The Least Square method without regard to assumption Equation 2 and 3 were used by Miller (1952;Goodman, 1953;Telser, 1966). Lee et al. (1965;Theil and Ray, 1966) offered the Restricted Least Square method that regard to Equation 2 and 3. The Minimum Absolute Deviation (MAD) was proposed by Ashar and Wallace (1963). In this study, the Restricted Least Squared (RLS) method is ignored because it is equivalent to solve the Quadratic Programming Problem (QPP). The solution of QPP is complex. As a result, it cannot solve a large scale problem. Therefore, the less complicated method than RLS will be diagnosed. The MAD method called Least Absolute Deviation (LAD) under constraints in Equation 2 and 3 is used to estimate the elements of the transition matrix when the approximation of ( ) k π was known. It shows that: The above model (A) consisted of Equation 4-6 has n linear constraints, n 2 of non-negativity constraints and decision variables. It can be reformulated as the following LP model: Equations 7-12 are the components of the corresponding LP problem referred as model (B) with n+2n(S-1) of linear constraints and n 2 + n(S-1) of nonnegativity constraints and decision variables. The explicitly comparison of the model (A) and (B) are shown in Table 1. The number of variables and constraints of model (A) are influenced by the number of states n of Markov Chain while the model (B) depends on both the number of states and the number of periods of historical data S. Therefore, the number of variables grows much faster than the number of constraints. If the number of states n or the number of periods S is very large up to computer capacity, the formulated LP can be unsolvable. This study is aimed to propose the use of column generation technique to solve this obstacle. In section 2, the proposed column generation algorithm is presented with an illustrated example in section 3. The computational results with discussions from the numerical experiment are shown in section 4. The conclusions of this study are finally summarized in section 5.

COLUM GENERATION
The processes of simplex method are reviewed. It shows that, for the minimization problem, the columns of constraint matrix that is nonnegative reduced cost are only used to determine. If the negative reduced cost is found, a column corresponding to this reduced cost will be produced. The technique that generates the coefficients associated with a variable as needed is called Column Generation (CG). The expectation of CG is that it can optimize the LP by examining only a subset of vector coefficients and it can avoid unnecessary computations. Therefore CG is often used to solve the large size LP. The concept of CG algorithm can be illustrated via the model (B) called Master Problem (MP), let X is a column vector of decision variables, c is coefficient vector of objective function, b is vector of right hand side of constraint and A is constraint matrix as follows: (1) Let N is the number of decision variables or the number of column in constraint matrix A which is equal to n(S-1)+n 2 . If r<N, then A can be written in form: where, A r is a sub-matrix of constraint matrix that contains coefficient of basic variables from Basic Feasible Solution (BFS) and A N-r is sub-matrix of coefficient of the remaining variables or non-basic variables.
The model that considers only the r variables is reduced to smaller problem of manageable size which is called Restricted Master Problem (RMP). An optimal solution of the RMP is not necessarily to be the optimal for the MP. An optimal dual solution of RMP (y) is used to calculate the reduced cost of non-basic variables. For choosing a variable to enter the basis the sub-problem is solved in the following manner: Assume that the variable X t is entering variable, a column of coefficients which according to X t is generated and added to RMP. The above process is implemented until the objective function cannot be improved.
In this study, the summary of the CG algorithm for estimating all elements in the transition matrix of Markov Chain by LAD method reformulated into LP model is given below.

Initialization
Choose the BFS (x B is a subvector of x). Assuming the values of U j(k) , for all j, k and p ij , for i = j are 1. So the x B consists of U j(k) , for all j, k and p ij , for i = j. It means that there are n(S-1)+n of the number of variables in RMP.

Main Step
• Solve the RMP with 'linprog' function of MATLAB • Use the vector of the shadow price or dual variables (y) from step 1 to calculate the reduced cost of nonbasic variables in the following manner: for every j, then the result from step 1 is optimal solution and if not, a column A t that according to will be generated and added to RMP and repeat to step 1 The further detail of CG technique can be studied in Bazaraa et al. (2010;Lasdon, 1970;Nash and Sofer, 1996). The detail of 'linprog' function can be examined at the website of MATLAB (2013).

EXAMPLE
In this section, the example drawn from (Theil and Rey, 1966) is considered. The market shares of three cigarette brands in the United States during the period 1939-43 are presented in Table 2.
The vector of X, c, b and the matrix of A are shown in Table 3.    Table 4. The vector of X, c, b and the matrix of A for restricted master problem u 1(1) u 2(1) u 3(1) u 1(2) u 2(2) u 3(2) P 11 The CG technique is used to solve this model. For initialization step, assuming the values of U j(k) ; j = 1,2,3; k = 1,2 and p 11 , p 22 , p 33 are 1. So the RMP that is used to solve in first step of main step consists of all vectors and matrix which shown in Table 4.
In iteration 1, there are 9 available variables in RMP. The 'linprog' function of MATLAB is used to the RMP. The value of LAD is 0.0898. The reduced cost ( ) j c ɶ of all non-basic variables are calculated in the manner that show in step 2 of main step. It found that, the reduced cost of p 12 , p 32 and p 13 are negative and the minimum of them is p 12 . It means that the 0.0890 of LAD is not minimum value. Therefore a column of constraint matrix that according to p 12 is generated and added to RMP. The number of available variables in iteration 2 is 10. Following the main step in above section until the reduced cost of all non-basic variables are nonnegative. The results of each iteration are presented in Table 5.
The result of last iteration is: (1)

NUMERICAL RESULTS
In this study an Intel(R) Core(TM) i3-2120 3.30 GHz CPU 4 GB RAM is used to perform the experiments. The maximum possible array is 640 MB which the product of number of rows and number of columns of the constraint matrix must not exceed 6.4×10 8 bytes. The trials start on generating square transition probability matrix P order n and n×S matrix of data according to the Equation 1 Subsequently, prepare the matrices and vectors according to the requirements of the 'linprog' function in MATLAB. The next step is solving the LP by two methods one is directly solved by 'linprog' function that all of decision variables are available and the other is using CG method as the algorithm shown in section 2.
The numerical computation is presented in the Table  6. It is found that when the size of constraint matrix of model (B) is not over 10 8 bytes, the directly solved by 'linprog' function can be used to optimize the LP. It has less average time to estimation than CG method. Both methods give the same average of LAD in every scenarios. Note that in the solution when the number of periods for data collection S is 10, CG method use less than 30% of variables while the directly solved by 'linprog' function requires all variables. The number of variables that use in CG technique is increased when S is expanded. If the size of constraint matrix of model (B) is over 10 8 bytes, the directly solved by 'linprog' function cannot be used, but CG can.
The comparison between the RLS and LAD method is shown in the Table 7 and 8. It is found that, the LAD method has surprisingly less SSE than the RLS on average implying that solving the problem linearly can maintain a better numerical stability computationally. The average time to estimation of LAD method which directly solved by 'linprog' function will be less than RLS method when the number of periods for data collection S is 10. In the case that the number of states n is more than 300, the LAD method which solve by CG technique is only method that suitable to estimate the transition matrix.

CONCLUSION
The probability statistics in transition matrix of large scale Markov Chain estimated by LAD method reformulated to LP model is studied. If the size of constraint matrix of LP is checked and found that it is not over the maximum possible array, the 'linprog' function in MATLAB should be used to optimize the LP and if not, when the number of states n is a large number and the number of periods for data collection S is a small number, the CG algorithm is an interesting method to be used to solve that problem because it uses less than 30% of the total problem variables and the number of variables used decreases when the number of states n is increased.

ACKNOWLEDGEMENT
The researchers would like to express our sincere appreciation to comments and recommendations from anonymous referees leading to the improved final version of this manuscript.