Fitting of Finite Mixture Distributions to Motor Insurance Claims

Problem statement: The modeling of claims is an important task of act u ries. Our problem is in modelling the actual motor insurance claim d ata set. In this study, we show that the actual mo tor insurance claim can be fitted by a finite mixture model. Approach: Firstly, we analyse the actual data set and then we choose the finite mixture Lognormal distributions as our model. The estimated parameters of the model are obtained from the EM al gorithm. Then, we use the K-S and A-D test for showing how well the finite mixture Lognormal distr butions fit the actual data set. We also mention the bootstrap technique in estimating the parameter s. Results: From the tests, we found that the finite mixture lognormal distributions fit the actual data set with significant level 0.10. Conclusion: The finite mixture Lognormal distributions can be fitte d to motor insurance claims and this fitting is bet ter when the number of components (k) are increase.


INTRODUCTION
Introduction and motivation: Many problems in actuarial science involve the building of a mathematical model that can be used to forecast or predict insurance costs. So modeling is an important procedure for actuaries so that they can estimate the degree of uncertainty as to when a claim will be made and how much will be paid. In particular, the modeling of claims and outstanding claims lead to the pricing of insurance premiums and an estimation of claim reserving, respectively. The most useful approach to uncertainty representation is through probability, so we will concentrate on probability models.
Losses depend on two random variables, i.e., the number of losses and the amount of loss which will occur in a specified period. The number of losses (claim number) is referred to as the frequency of loss (claim frequency) and the probability distribution is called the frequency distribution. The amount of loss (claim size) is referred to as the severity of loss (claim severity) and its probability distribution is called the severity distribution. Loss distribution and its modeling are described in detail in the book of Klugman et al. (2008) and paper of Janczuraa and Weron (2010). The severity distribution is solely considered for this study.
The mixture of distributions is sometime called compounding, which is extremely important as it can provide a superior fit. A successful use of this technique is illustrated in Hewitt and Lefkowitz (1979). In the 1960s and 1970s, finite mixture models appeared in the statistical literature and they proved to be useful for modeling discrete unobserved heterogeneity in the population. Since there are many different modes for claim possibilities, a finite mixture model should work well.
The Expectations-Maximization (EM) algorithm is provided to fit the model that introduces unobserved indicators with the goal of maximizing the complete likelihood function. The EM algorithm is also applicable for parameter estimation of mixture models. For more detail, McLachlan and Peel (2000); Aitkin and Rubin (1985); Hogg et al. (2004) and Hogg and Klugman (1984).
The bootstrap process is a tool for fitting and it is not complicated to implement. Usually, the bootstrap process involves resampling with replacements from the residual more than the data themselves. We apply the bootstrap technique to recalculate the estimated parameters for model fitting. For more detail, Efron and Tibshirani (1993).
The purpose of this study is find a statistical model for the claim severity. Many authors investigate some special distributions of the severity claims and apply them to calculate the insurance premium. Recently, Mohamed et al. (2010) investigated a model of severity claims which has Pareto distribution and they used it to calculate insurance premiums under the retention limit. Moreover, Brazauskas et al. (2009) suggest the Method of Trimmed Moments (MTM) in the case of loss distribution of Lognormal and Paerto and analyze a real data set concerning hurricane damage in the United States. But in our work, we work in the opposite direction, i.e., we shall find a model that is fitted to the empirical data.
We considers the data from a set of motor insurance claims from the top three non-life insurance public companies in Thailand. A mixture model is fitted to the data and the estimated parameters for the model are calculated by the EM algorithm. We also use the bootstrap technique to fit the data and show that the bootstrap sample for observation can be applicable to the estimated parameters.

MATERIALS AND METHODS
We present the statistical modeling for a finite mixture of Lognormal distributions, the EM algorithm is explained and the bootstrap technique is demonstrated.

Statistical modeling:
The skewed right distribution such as Gamma, Lognormal, Weibull and Pareto distribution have often been used by actuaries to fit claim sizes; see Klugman et al. (2008). In insurance companies, there are 2 types of claim data recording, i.e., individual and group data. We model the individual claim data. Some assumptions and symbols are specified as below.
Assumption 1 (Policy independence): Consider n different policies (contracts). Let X i denote the response for policy i. Then X 1 ,…, X n are independent.
Assumption 2 (Time independence): Consider n disjointed time intervals. Let X i denote the response in time interval i. Then X 1 ,…, X n are independent.
Assumption 3 (Homogeneity): Consider any two policies in the same tariff cell, having the same exposure. Let X i denote the response for policy i. Then X 1 and X 2 have the same probability distribution.
Assumption 5: A recorded claim is equal to an actual claim (observation).

Single parametric distribution:
On the basis of the analyst's knowledge, experience and statistical test, the Lognormal distribution is our selection for modeling and fitting to the data set. The Maximum Likelihood Estimation (MLE) is provided for estimation of the parameters.
The model: Assume that X ∼ Lognormal (µ,σ), abbreviates X~ LN (µ,σ), with density: Estimation for the model: Let x = (x 1 ,…, x n ) be an independent observation. Consider the amount x i paid for the i th contract. We fit the Lognormal distribution in Eq. 1 to the data set by MLE.
The likelihood function We estimate μ and ˆ σ for µ and σ respectively by We obtain maximum likelihood estimates for the parameter µ and the parameter σ as follows: ( ) Finite mixture models: We consider the second-order and more than second-order finite mixture model. We aim to find the mixing weights according to the number of Lognormal distributions and estimated parameters by the MLE via EM algorithm.
The model: Then its probability density function is: where, 0<τ j <1 for j 1, ..., k = and τ 1 +…τ k = 1. The likelihood function can be written as follows: and the log-likelihood function is in form: The complete likelihood takes quite a simple form: The complete log-likelihood function is Eq. 3: For each k components, there are 3k-1 unknown parameters that will be estimated by EM algorithm. We use a computer for the calculation of the parameters and visualization as a way to see its modeling. The proper number of components to be included in the mixture model will be considered.
Note that T ij is the marginal probability that an observation x i comes from the j th component. By Baye's theorem, the marginal probability T ij is given by: we maximize Eq. 4 to estimate ψ . Firstly, we solve the first order conditions: This has the same form as the MLE for the multinomial distribution, so: We will estimate θ 1 by solving: and Eq. 4 imply: For a given set of parameters ψ, i.e., θ j = (µ j , σ j ), j = 1,2,…, k and τ = (τ 1 ,…, τ k-1 ), the E-step consists of calculating T ij and τ j for M-step. Given τ j , the M-step consists of maximizing the expected complete loglikelihood function. The E-step and M-step are repeated in an alternating fashion until the expected complete log-likelihood fails to increase. At this point, we conduct a final M-step in which the set of parameters ψ is estimated. Otherwise, we return to the E-step for the next iteration. In the final step after the m th iteration, the EM algorithm is produced as below: E-step: Given our current estimation of the parameters ψ (m) after the m th iteration. Thus the E-step results in the function:

M-step:
Maximizing ψ. That is: By taking partial derivative Eq. 5 with respect to ψ and by equating to zero, one gets: x , x ,..., x . Then we recalculate the estimated parameters, * µ and * σ , by MLE based on * x .

Residual bootstrap:
There are many forms of the residual definition and it is important to use an appropriate residual definition for the determination of each problem. We have already considered some forms of residual definitions, such as the unscaled Pearson residual and the unscaled Anscombe residual. But these forms of residual are not suitable for our data. Hence, we consider the residual form μ , that is, we define the form of the residual as follows: where, ε i is the residual (i = 1,2,…, n) and μ comes from Eq. 6. Let be the resample residual. By using the bootstrap technique, we obtain a resample ε* and the bootstrap data samples Eq. 7: * * i il n x ,i 1, 2,..., n = ε + µ = We recalculate the estimated parameters, * µ and * σ by MLE based on * i ln x , i = 1, 2,…, n.

Goodness of fit test:
The Goodness of Fit (GOF) test measures the compatibility of a random sample with a theoretical probability distribution function. We use the Kolmogorov-Smirnov test (K-S test) and the Anderson-Darling test (A-D test) for showing how well the distribution fits our data set. The K-S test is used to decide if a sample comes from a hypothesized continuous distribution. It is based on the Empirical Cumulative Distribution Function (ECDF) and denoted by: The K-S test statistic is defined by: The A-D test is a general test to compare the fit of an observed cumulative distribution function to an expected cumulative distribution function. This test gives more weight to the tails than the K-S test.
The A-D test statistic is defined as: where, * X F is the theoretical cumulative distribution of the distribution being tested.

RESULTS
The finite mixture of Lognormal distributions is applied to the actual set of claims data and the bootstrap procedure is analyzed. An analysis and some comparisons are shown with respective to statistical tests.

An application:
We fitted the finite mixture of Lognormal distributions to the data set which was provided by a non-life insurance company in Thailand. We considered it for both a whole portfolio and various types of product coverages. The Kolmogorov-Siminov test and the Anderson-Darling test are statistical tests for model fitting.

Motor insurance data set:
We consider the data set of motor insurance claims for the year 2009; all types of vehicle, i.e. automobiles, lorries and motorcycles are included. The total of each claim amount is paid by the insurer. The data set is classified by product coverage type-i for i = 0, 1,…, 5. There are 1,296 observations of type-5 that meet the mixture Lognormal distributions. The historical data of sevirity claim and histogram of severity claim (log scale) are illustrated in Fig. 1 and 2, repectively. Table 1-2 show the statistical test value for fitting the finite mixture Lognormal distributions to the data set. For both the K-S and A-D test consideration, the summaries are as the following cases.
Case 1: at a significant level of α = 0.05. We obtain the estimated parameters, ˆ8.9672 µ = andˆ1.1804 σ = , that the Lognormal distribution does not fit to type-5. While the mixture Lognormal distributions are fitted to type-5 as k components greater than or equal to 20.
Case 2: at a significant level of α = 0.10. The mixture Lognormal distributions are fitted to type-5 as k components equal to 25 and over. Mostly, k components of the mixture Lognormal distributions are better fit to the type-5 while k are increased. The maximum numbers of components is 130, since over 130 components are not applicable to k mean clustering.
From Table 2, by the A-D test. We can see that A 2 value are reduced when k are increased as the D value is not.      Figure 7-8 show the P-P plot of finite mixture Lognormal distributions when k=1 and k=100, respectively. A bootstrap data sample can be calculated by using Eq. 6 and 7 for observation and residual respectively. The Lognormal distribution was fitted to the data set, when we recalculated the new estimated parameters respective to the bootstrap process. We have found that the Lognormal distribution is fitted to type-5 at a significant level of α=0.01 for both the K-S and A-D test. By K-S test, the Lognormal distribution can be fitted to type-5 at a significant level of α=0.10. We can see some examples of this from Table 3.
From Table 3, we can see that the bootstrap technique can be applicable to refitting the model of the data set. Note that the residual bootstrap provides better A 2 values in a shorter time of a computer run than the observation bootstrap.

DISCUSSION
We should consider the infinite mixture Lognormal distributions (uncountable family) for reducing the problem of the number of components and it should be considered for the fitting of truncated and/or censored data sets in further research.
The model presented fitted the claim amount. It can be used for actuaries to determine which estimated parameters are acceptable or distribution functions are suitable for their work. The bootstrap technique can estimate the parameters easily and quickly. The finite mixture model makes the approach moderately useful for heavy tail (fat tail) distribution.

CONCLUSION
The finite mixture of Lognormal distributions can be fitted to the set of actual claim data while the Lognormal distribution cannot be fitted. The mixture of Lognormal distributions fit very well to product type-5. The limitation of the finite mixture model is the number of components that depend on a mean clustering. So we should be careful to consider the number of components used for computing the estimated parameters. The estimated parameter of Lognormal distribution by using the bootstrap method is fitted to the data according to K-S test. Although the bootstrap process is not as good for fitting in a tail as the finite mixture of Lognormal distributions is.