Handling the Dependence of Claim Severities with Copula Models

Problem statement: Several studies have been carried out on the model ing of claim severity data in actuarial literature as well as in insurance practice. Since it is well established t hat he claim cost distributions generally have positive su pport and are positively skewed, the regression models of Gamma and Lognormal have been used by pra ctitioners for modeling claim severities. However, the fitting of claim severities via regres sion models assumes that the claim types are independent. Approach: In this study, independent assumption between clai m types will be investigated as we will consider three types of Mal aysian motor insurance claims namely Third Party Body Injury (TPBI), Third Party Property Damage (TP PD) and Own Damage (OD) and applied the normal, t, Frank and Clayton copulas for modeling d ependence structures between these claim types. Results: The AIC and BIC indicated that the Clayton is the best copula for modeling dependence between TPBI and OD claims and between TPPD and OD claims, whereas the t-copula is the best copula for modeling dependence between TPBI and TPP D claims. Conclusion: This study modeled the dependence between insurance claim types using copulas on the Malaysian motor insurance claim severity data. The main advantage of using copula i s that each marginal distribution can be specified independently based on the distribution of individu al variable and then joined by the copula which takes into account the dependence between these var iables. Based on the results, the estimated of copula parameter for claim severities indicate that e dependence between claim types is significant.


INTRODUCTION
The pricing of premium for fire, motor and workmen compensation insurances in Malaysia is governed by their respective tariffs formulated by General Insurance Association of Malaysia (PIAM). The main objective of tariffs is to guarantee the premium rate to be at least at the level required by the Malaysian government, ensuring the price competition among local insurers to be above the market's economic level. However, one of the effects caused by the world economic crisis in 1997 is the process of liberalization spreading gradually in most financial sectors in Malaysia, including non-life insurance sector. Therefore, a thorough and comprehensive preparation towards the development of a more matured and open insurance market should be undertaken by the sector and regulatory concerned. In achieving this target, one of the main tasks that should be given serious attention is the determination of appropriate premium rate, especially in low premium and high volume non-life insurance businesses, which can be accomplished via statistical modeling.
Statistical modeling of premium rate requires two crucial estimates; the probabilities associated with the occurrence of insured events namely claim frequency and the magnitude of such events namely claim severity.
Claim frequency can be defined as the number of claims per exposure unit, whereas claim severity is the average claim cost per claim. Based on the actuarial literature, statistical estimates of claim frequency and severity are often calculated through the process of grouping risks with similar risk characteristics for the purpose of establishing "fair" premium price, known as risk classification.
Several studies have been carried out on the modeling of claim severity in actuarial literature as well as in insurance practice. Since it is well established that the claim cost distributions generally have positive support and are positively skewed, the distributions of Gamma and Lognormal have been used by practitioners for modeling claim severities. For examples, McCullagh and Nelder (1989) fit the UK own damage costs for privately owned and comprehensively insured vehicles using Gamma regression model by assuming the coefficient of variation is constant within classes and the mean is incorporated in the model via an inverse link function, Brockman and Wright (1992) fit the UK own damage costs for comprehensive motor policies also to the Gamma regression via a log link function, Renshaw (1994) fit the UK motor insurance claim severity also to the Gamma regression via a log link function and Ismail and Jemain (2009) fit the Gamma and Inverse Gaussian regressions via the log, linear and inverse link functions to the Malaysian motor claim costs data. As a comparison, several actuarial studies also reported claim severity results from Normal distribution via Box-Cox transformation and one such example can be found in Harrington (1986) who fitted two types of motor insurance data, the UK and the Massachusetts data.
However, the fitting of claim severities through regression models assumes that the claim types are independent. In this study, such assumption will be investigated as we will consider three types of Malaysian motor insurance claims namely Third Party Body Injury (TPBI), Third Party Property Damage (TPPD) and Own Damage (OD) and applied copula for modeling the dependence structures between these claim types. In other words, instead of implementing a traditional univariate claim analysis, we will perform a bivariate analysis for claim severity data, taking into account the possibility of damage in an accident which resulted in more than one claim types and taking into consideration the impact of dependence of one claim type on another claim type incurred out of the same accident.
Copula model expresses the joint distribution of two or more random variables by separating the joint distribution into two contributions; the marginal distributions of individual variables and the interdependency of probabilities of individual variables. An advantage of copula is that each marginal distribution can be specified in isolation of others and then joined by the copula. Copula models have been applied in several areas such as finance, insurance and environmental studies. In actuarial literature, Frees and Valdez (1998) and Klugman and Parsa (1999) applied copulas for modeling claim sizes and allocated loss adjusted expenses, Frees and Wang (2005) handles serial time dependences through t-copula by assuming the marginal distribution for claim severity data follows a Generalized Linear Model (GLM) and Frees and Wang (2006) model time dependencies for count data by using elliptical copulas. Introductions to copulas can be found in Frees and Valdez (1998).
The objective of this study is to model the dependence between insurance claim types using copula. The copula models are applied on the Malaysian motor insurance claim severity data which is divided into three types namely TPBI, TPPD and OD. Specifically, two stages of fitting will be involved. First, the TPBI, TPPD and OD claim severities are fitted independently to the regression models of Gamma and Inverse Gaussian. Then, for investigating the dependence between claim types, the TPBI, TPPD and OD claim severities are fitted, compared and tested on the Normal and t-copulas which belong to the elliptical families and the Clayton and Frank copulas which belong to the Archimedean families.

MATERIALS AND METHODS
Gamma regression model: Let C i be the random variable for claim severity or equivalently the claim cost for the i-th risk class, i = 1, 2,…,n, where n denotes the number of risk classes. If C i follows a gamma distribution, the probability density function (pdf) is: with mean, E(C i ) = µ i and variance, ( ) where v denotes the index parameter. To incorporate covariates and to ensure non-negativity, the mean is included in the regression model via a log link function,

Inverse Gaussian regression model:
If C i is distributed as Inverse Gaussian distribution, the pdf is: with mean, E(C i ) = µ i and variance, Normal copula: The idea of Sklar's Theorem for a two-dimensional cumulative distribution function (cdf), F, is to represent the function into two parts; the marginal cdf, F i and the copula, H, which describes the form of dependence in the distribution. Both F i and H are connected by the cdf: where, U 1 and U 2 denote the standard uniform random variables. By differentiation, the corresponding probability distribution function (pdf) is given by, where, f i is the marginal pdf and h the copula pdf. We will fit two families of copula; the elliptical and the Archimedean. An elliptical copula corresponds to an elliptical distribution by the Sklar's Theorem. Let F be the multivariate cdf of an elliptical distribution, whereas let F i be the cdf of the ith margin and 1 i F − , i 1, 2, = be the inverse function (or the quantile function). The elliptical copula is: The copula of a normal joint cdf is called normal copula. If ρ is the correlation parameter, the pdf of a normal copula is given by: The likelihood function depends on the association of copula and marginals. As an example, if the observed severity arise from the first and second claim types where the marginal density functions respectively are f 1 (c 1 ) and f 2 (c 2 ), the contribution to the likelihood can be written as: The marginal parameters and copula parameter (correlation parameter) can be obtained using maximum likelihood procedure.

t-copula:
The copula of a student t joint cdf is called tcopula. If ρ is the correlation parameter and v is the degrees of freedom, the pdf is given by: Similar to the normal copula, the marginal parameters and copula parameter (correlation parameter) for t-copula can be obtained using maximum likelihood procedure.
Clayton copula: An Archimedean copula is constructed through a generator, ϕ, as: where, ϕ −1 is the inverse of the generator and U 1 and U 2 are standard uniform random variables. A generator uniquely determines an Archimedean copula. The generator of Clayton copula with space parameter, α, is given by: and the inverse is: From the generator, ϕ, and the inverse, ϕ −1 , the copula of Clayton can be obtained: and the pdf is: The marginal parameters and copula parameter (space parameter) can be obtained using maximum likelihood procedure.
Frank copula: The generator of Frank copula with space parameter, α, is given by: and the inverse is: From the generator, ϕ and the inverse, ϕ −1 , the copula of Frank can be obtained: and the pdf is: ( ) Similar to the Clayton copula, the marginal parameters and copula parameter (space parameter) for Frank copula can be obtained using maximum likelihood procedure.

RESULTS
The database for the Malaysian claim severities, which is supplied by Insurance Services Malaysia Berhad (ISM), provides information on private car insurance portfolios in years 2000-2003. The sample data contains 572,627 policies with 52,522 (9.17%) claims which can be categorized into three claim types; OD, TPPD and TPBI. The risk of each claim is associated with four rating factors namely scope of coverage, vehicle make, vehicle cubic capacity and vehicle year and the rating factors and classes are shown in Table 1. The best regression models for gamma and inverse Gaussian, each for OD, TPPD and TPBI claims, are presented in Table 2-4.          The dependence between claim types is investigated by fitting the marginal distribution first, followed by fitting the copula models, so that the parameter estimates obtained from fitting the marginals can be used as initial values for estimating the parameters in copulas. In particular, the marginal distribution for the OD, TPPD and TPBI severities are the inverse Gaussian, Gamma and Gamma regression models respectively. As for the copula models, the claim severities are fitted to the normal, t, Frank and Clayton copulas. Table 5-7 summarized the results of fitting copula models respectively to the TPBI-OD claims, TPBI-TPPD claims and TPPD-OD claims.

DISCUSSION
Based on the results in Table 2, the rating factors for scope of coverage and vehicle cubic capacity are significant for OD claim severities. In particular, the risks for non-comprehensive coverage and vehicle with 1301-1800 c.c. are lower compared to others. The log likelihood, AIC and BIC shows that the inverse Gaussian is a better model compared to the Gamma.
The results in Table 3 illustrated that the rating factors for scope of coverage, vehicle cubic capacity and vehicle year are significant for TPPD claim severities. Specifically, non-comprehensive coverage, vehicle with more than 1300 c.c. and vehicle aged 0-1 and 4+ years have lower risks. Based on the log likelihood, AIC and BIC, the Gamma is a better model compared to the inverse Gaussian.
Similar to the TPPD claim severities, the significant rating factors for TPBI claim severities, as shown in Table 4, are scope of coverage, vehicle cubic capacity and vehicle year. Comparison based on the log likelihood, AIC and BIC shows that the Gamma is a better model compared to the inverse Gaussian. For Gamma model, the risks for non-comprehensive coverage, vehicle with 1301+ c.c. and vehicle aged 0-1 year are lower compared to others.
The estimates of copula parameter (correlation coefficient, ρ, or space, α) for TPBI-OD severities, TPBI-TPPD severities and TPPD-OD severities shown in Table 5-7 indicate that the dependence between claim types is significant. In particular, the log likelihood in Table 5-7 shows that the t-copula is a better model compared to the normal copula for elliptical family, whereas the Clayton copula is an improvement over the Frank copula for Archimedean family. Based on the AIC and BIC, the Clayton copula is the best model for accommodating the dependence between TPBI and OD claim severities and between TPPD and OD claim severities, whereas the t-copula is the best distribution for modeling the dependence between TPBI and TPPD claim severities.

CONCLUSION
This study models the dependence between insurance claim types using copulas on the Malaysian motor insurance claim severity data which were divided into three types; TPBI, TPPD and OD. Four types of copulas namely normal, t, Frank and Clayton are fitted to the severity data. One main advantage of using copula is that each marginal distribution can be specified independently based on the distribution of individual variable and then joined by the copula which takes into account the dependence between these variables. The marginal distribution selected for the TPBI, TPPD and OD claim severities respectively are the Gamma, Gamma and inverse Gaussian regression models. Based on the log likelihood, the t-copula is superior than the Normal and the Clayton copula is better than the Frank for all TPBI-OD, TPBI-TPPD and TPPD-OD claim severities. The AIC and BIC indicate that the Clayton is the best copula for modeling TPBI-OD and TPPD-OD severities, whereas the t-copula is the best copula for modeling TPBI-TPPD severities.