Estimating the Claim Severity Distribution using Variable Neighborhood Search

Corresponding Author: Kunjira Kingphai Graduate School of Applied Statistic, National Institute of Development Administration, Bangkok, Thailand Email: kunjira.k@grads.nida.ac.th Abstract: In this study, Variable Neighborhood Search (VNS) is utilized to estimate the parameters of actual motor insurance claims data set and compared them obtained by the Moment Estimation Methods (MOM) and Maximum likelihood Estimation Method (MLE) which are known as a conscientious method. Then, the Kolmogorov-Smirnov test (K-S) is used to show how well the selected distribution fits the actual claims. From the results, we found that the lognormal distribution which their parameters were estimated from VNS technique fits the actual motor claims data set better than the other two techniques with significant level 0.01.


Introduction
Modeling the distribution of claim data is an important task of general insurance companies regularly faced up with. Then, a responsibility about finding an appropriately statistical distribution by using an available claim data and testing how well this statistical distribution fits the data is necessary for the opportunities in predicting future claim severity (Tosaporn and Sattayatham, 2012). One of the key steps in modeling the distribution of this data is to estimate parameters within the models. In this study, we will use a meta-heuristic to estimate the parameters of claim severity distribution. The meta-heuristic is method that used to find a good solution in difficult and complex problem that cannot solve by using the exact method otherwise, this may take a long time to fix its. However, the response from the meta-heuristic method cannot guarantee that will be the best answer as from the exact solution. To find the right answers by metaheuristic, there have many purposed algorithm such as Simulated Annealing, Tabu Search, Ant Colony Optimization, Particle Swarm Optimization, Variable Neighborhood Search (VNS), etc. Soontorn et al. (2013) have evaluated the severity of fire accidents by using a Randomized Neighborhood Search (RNS) method to estimate the Weibull parameters. Babak et al. (2011) have maximized the likelihood function of threeparameter Weibull distribution by a new methodology that they have proposed. That is a combination between a Variable Neighborhood Search (VNS) and a Simulated Annealing (SA). Like as Nima et al. (2012), a new hybrid meta-heuristic have been proposed which compounds of a Variable Neighborhood Search (VNS) together with an Iterated Local Search (ILS) algorithm. This procedure is applied to maximize the likelihood function of Weibull of Burr III distribution which composes of four parameters. In this study, we will use meta-heuristic that developed in the form of the Variable Neighborhood Search (VNS) for parameter estimation because of this process is not complicated and it is not as prevalent in the statistics for used it to estimate a parameter and some references such as Ming-Jun and Liu (2015;Christopher and Schmid, 2009;Yang, 2015) which the works are related to this paper and you can view them a generalization or specialization in some aspect. Moreover, some of them have built the heuristical foundation for MOM.
The remaining of the paper is arranged as follows. First, the background of Variable Neighborhood Search: VNS is briefly introduced then we used a real data set to fit the model and estimate parameters with MOM, MLE and VNS. In the next section, the estimates of parameters are shown and how well of the distribution fits are improved in Checking Model Fit section. Finally, we conclude the present work accordingly some further research for those interest in parameter estimation using the VNS method.
The Background of Variable Neighborhood Search: VNS Nenad and Hansen (1997) have introduced a novel technique named Variable Neighborhood Search to explore remote neighborhoods of the present responsible solution and move to a novel one in case this new one is an improvement. This technique is based on Local Search method. It is designed to be used in solving the problem in several ways. Whether the problems are on a continuous or discontinuous and can also be used for solving linear program, nonlinear program, integer program, etc.
In programming a VNS algorithm, we will consider the following: • N s (s = 1,..., s max ), a set of all conceivable neighborhood structures and with the N s (θ) set of solutions in s th neighborhood of θ. While, Pierre et al. (2008) claims that "most local search heuristics use only one neighborhood structure, i.e., s max = 1" • Initial solution θ which may be a random number from various distribution or from others method to generate • The target function • The stopping criteria which is up to researcher discretion may be the highest round of iteration between two improvements or the longest CPU time • A local optimizer which is used in the Local Search step frequently relies on the target function properties. The familiar choices of the local optimizer used for local search are Fletcher-Power (FP), Fletcher-Reeves (FR), Hooke-Jeeves (HJ), Nelder-Mead (NM), Rosenbrock (RO) and Steepest Descent (SD) (Nenad et al., 2008). Here we used the Iterated Local Search algorithm (ILS) (Nima et al., 2012) Its step is given in VNS Algorithm as follow: Threshold: Assigning the set of neighborhood structures N s ; s = 1,..., s max ; an initial solution θ; a stopping criteria then, Stress the following until the stopping criteria is reached: Set s = 1; Iterate until s = s max a) Initiation: Produce a point θ′ at arbitrary from the neighborhood s th of θ(θ′∈N s (θ)); b) Local Search: Employ a local search method with θ′ as a prior solution; indicate with θ′′ the acquired local optimum; c) Neighborhood Changes: If this local optimum is progressive than the present responsible solution, move there (θ = θ′′) such as f (θ) is the target function to be magnified if f (θ′′)> f (θ) set θ = θ′′ and keep on searching with N 1 (s = 1); if not, defind s = s +1; VNS is a meta-heuristic method that easy to use and fast to find the most appropriate solution under various problems (Bessadok et al., 2009). Especially of this study, in the case of a parameter estimates cannot be found by statistical methods.

Modeling on a Real Data Set with Expected Distribution
In this study, a real motor claim data set which was collected between 2005 and 2009 from a public non-life insurance company in Thailand is used. A plot of empirical distribution function of the actual claim data set was compared with the cumulative distribution function of the reference skewed right distribution (Philip, 2007;Stuart et al., 2012) because in non-life insurance, claims data are very skewed (Martin, 2012). Eventually, the Lognormal distribution and the Gamma distribution were relatively closed to have the best fit. That can be observed from Fig. 1 and 2.
The probability density function (pdf) of the Lognormal distribution is defined by: where, -∞<µ<∞ and σ > 0 are the location parameter and scale parameter respectively and the pdf of the Gamma distribution is defined by: ; 0 where, α > 0 and β > 0 are the shape parameter and scale parameter respectively.

Moment Estimation Method (MOM)
From the probability density function in Equation 1, we can find the parameter estimator from MOM by find k th moment of sample and population and we obtain: When x is a random variable from the probability density function as in Equation 2 of the Gamma distribution, we can find the moment estimator as follows: We can obtain MLE μ and σ for µ and σ by partial derivative of ln L(x; µ,σ 2 ) with respect to µ and σ respectively which set to zero gives. Then, the maximum likelihood estimates for µ and σ are: The Hessian is negative-definite, indicating a strict local maximum.
Similarly, for the Gamma distribution, the likelihood function is obtained as Equation 10: We can get MLE of β and α as β and α by partial derivative of ln L(x;α,β) with respect to β and α respectively which set to zero gives. So, the maximum likelihood estimates of β and α can be aquired as follows: where, ( ) ( ) Since (12) cannot be solved in a closed form then, the value α will be obtained by Newton-Raphson iteration method from Equation 13 as follow: < where is some defined error endurance and ε > 0.

Variable Neighborhood Search Method (VNS)
The main reason of using the VNS algorithm to find parameter estimates is that although the maximum likelihood estimators has many desirable properties but in practice the likelihood functions are hard to maximize and there do not have a closed form. So, we will present the VNS algorithm used to find parameter estimator.
In the VNS algorithm planning, we have meditated details as follows: • Choosing the initial solution θ from estimates of MOM and MLE, then calculating the likelihood function, L(θ), of this initial, where is a vector of parameters • In this study, the VNS is performed under a prefered property of the MLE method. This search is expected to maximize the likelihood function of the Lognormal distribution and the Gamma distribution given in Equation 7 and 10 respectively. So, the target function is likelihood function. • Obtaining the neighborhood of VNS method (N k ) which is created by Equation 14 from Babak et al. (2011): where u as a random numbers is generated randomly using a uniform distribution in the intervals [0,1]. d will be taked whether 1 or -1 which is a run of the new neighborhood and r is the span of neighborhood initiator of the VNS algorithm, here it is defined to be one while the target function improves. In spite of the target function does not enchance, its value is expanded by 0.001. In this study, we have only one the neighborhood structured so, k max = 1 • The stopping criteria is 10,000th iteration or |θ′′-θ|<0.001 • Determine a random number v and s which are generated randomly for the ILS algorithm in Local search step to generate the neighboring value • In the local search step of the VNS algorithm, we choose the Iterated Local Search (ILS) which its algorithm can be split into three stages: The first stage is to create the initial solution which the initial solution will be improved in the second stage and the final as choosing the best answer • Letting n denotes the number of iteration of searching • The following steps are reiterated until the stopping condition is reached To apply VNS procedure, we compute the estimated parameter of MOM and MLE from the actual claim data in Table 1.
Then, we obtained estimated value of parameters of the Lognormal distribution by VNS as in Table 2 and for the Gamma distribution by VNS as in Table 3 as follows.

Results
We fitted the Lognormal and the Gamma distribution that we chose as in section of Modeling on a Real Data Set with Expected Distribution to the actual motor claim data set which was mentioned. Moment Estimation Method, Maximum likelihood Estimation Method and Variable Neighborhood Search Method are applied to estimate parameters. The estimates of parameters are shown in Table 1 to 3.

Checking Model Fit
For showing how well distributions fit our data set, the Kolmogorov-Smirnov test (K-S test) is applied. The K-S test statistic is defined by KS = max {KS + , KS -}. Where:   The Fig. 3 and 4 show how the estimated fit to the actual data as follows.

Discussion
From the results of this real motor claim data set, they show that the Lognormal model which this parameters were estimated by VNS method and initial estimates from MLE has the smallest D value according to K-S test as in Table 4. So, for this data set we can conclude that this model is the best fit and VNS gives a more accurate estimation of parameter than do MOM and MLE.
In this study, VNS algorithm, a uniform random number u which used in the neighborhood of VNS structure (N k ) is a good choice but, we may choose other distributions to generate a random number u because it may be much better performance. Moreover, as a Local Search from this research has been used the Iterated Local Search in future are may be bring others meta-heuristics such as Ant Colony Optimization, Simulated Annealing, etc. in to VNS algorithm.

Conclusion
When distributions that we interest are complicate and have many parameters in model, the algorithm of estimation is not a comfortable work. For instant, the likelihood function which is created for the Gamma distribution in the Maximum likelihood Estimation Method is hard to maximize and do not has a closed form. So, the proposed methodology: VNS which is computationally efficient and easily implemented can solve this problem and have the performances in term of both the estimation accuracy and less time to calculate.

Author's Contributions
Samruam Chongcharoen: Participating in all steps in conducting research and contribution to the writing of the manuscript.
Kunjira Kingphai: Computer programming, analyzing and interpreting the data, reporting and evaluating research Ethics This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.