Negative Binomial-Lindley Distribution and Its Application

Problem statement: The modeling of claims count is one of the most imp ortant topics in actuarial theory and practice. Many attempts were i mplemented in expanding the classes of mixed and compound distributions, especially in the distr ibution of exponential family, resulting in a bette r fit on count data. In some cases, it is proven that mixed distributions, in particular mixed Poisson and mixed negative binomial, provided better fit co mpared to other distributions. Approach: In this study, we introduce a new mixed negative binomial d istribution by mixing the distributions of negative binomial (r,p) and Lindley ( θ), where the reparameterization of p = exp(λ) is considered. Results: The closed form and the factorial moment of the new distribution, i.e., the negative binomial-Lindley distribution, are derived. In addi tion, the parameters estimation for negative binomial-Lindley via the method of moments (MME) an d the Maximum Likelihood Estimation (MLE) are provided. Conclusion: The application of negative binomial-Lindley distr bution is carried out on two samples of insurance data. Based on the results, it is shown that the negative binomial-Lindley provides a better fit compared to the Poisson and the negative binomial for count data where the probability at zero has a large valu e.


INTRODUCTION
The modeling of claims count is one of the most important topics in actuarial theory and practice. In some cases, it is proven that mixed distributions, in particular mixed Poisson and mixed negative binomial, provided better fit on count data compared to other distributions. Examples of mixed Poisson and mixed negative binomial distributions are the negative binomial obtained as a mixture of Poisson and gamma (Klugman et al., 2008;Lemaire, 1979;Simon, 1961), the negative binomial-Pareto (Klugman et al., 2008;Meng et al., 1999) and the Poisson-inverse Gaussian (Klugman et al., 2008;Tremblay, 1992;Willmot, 1987).
The mixing of negative binomial with inverse Gaussian which considers the reparameterization of p = exp(-λ) was introduced by Gómez et al. (2008) who considered the distribution in univariate and multivariate versions and estimated the parameters using the method of moment and the maximum likelihood estimation. The distribution of Poisson-Lindley for modeling count data was introduced by Sankaran (1970). Ghitany et al. (2008a) showed that in many ways, Lindley is a better distribution compared to exponential and hence, one should expect that the Poisson-Lindley provides a better fit compared to the Poisson-exponential. The distribution of zero-truncated Poisson-Lindley was introduced by Ghitany et al. (2008a;2008b), who used the distribution for modeling count data in the case where the distribution has to be adjusted for the count of missing zeros.
In this study, we introduce a new mixed negative binomial distribution by mixing the distributions of negative binomial (r, p) and Lindley (θ), where the reparameterization of p = exp(-λ) is considered. This new mixed distribution has a thick tail and may be considered as an alternative for modeling count data of insurance claims which has a thick tail and a large value at zero.

MATERIALS AND METHODS
Closed form and factorial moment: In this part, the closed form and the factorial moment of the negative binomial-Lindley distribution are given.
A classical negative binomial distribution is denoted with probability mass function (pmf): r x r x 1 Pr(x) p (1 p) , x 0,1, 2,... x where, r>p and 0<p<1. The first two moments about zero and the factorial moment of order k of a negative binomial distribution are given respectively by Gómez et al. (2008): The Lindley distribution function which is specified by the probability density function (pdf): was introduced by Lindley (1958). It can easily be shown that the Lindley distribution is a mixture of exponential and gamma with 1 p 1 = + θ , where f 1 (x) = Gamma(2,θ), f 2 = exp(θ) and f(x) = pf 1 (x)+(1-p)f 2 (x). The moment generating function of Lindley distribution is given by: Definition: A random variable X has a negative binomial-Lindley (r,θ) distribution if it satisfies the stochastic representation: and λ~Lin(θ) where, r>0 and θ>0.
Throughout this study, we will use the notation NB-L(r,θ) as a reference for the negative binomial-Lindley distribution. Figure 1-4 show that the pmf of NB-L(r,θ) has the highest mass at zero for several values of r and θ. Hence, the NB-L(r,θ) may has a good fit for claims count data in the case where the probability at zero has a large value.
Theorem: Let X~NB-L(r,θ) be a negative binomial-Lindley distribution defined in (5) and (6). The probability mass function (pmf) and the factorial moment of order k of the distribution are given as (7) and (8) respectively: Proof: If X|λ~NB(r,p = e −λ ) and λ~Lin(θ), then the pmf of X can be obtained by: We know that: By inserting (10) in (9) we have: Then, (7) is obtained by using (4) in (11). Gómez et al. (2008) showed that the factorial moment of order k of a mixed negative binomial distribution where p = exp(-λ) can be obtained by using: Therefore, (8) is obtained by inserting (4) in (12). From (12), the first two moments of the NB-L(r,θ) are given by: Parameter estimation: In this segment, the estimation of parameters for NB-L(r,θ) via the method of moments and the maximum likelihood procedure are provided.

Method of moments (MME):
For the method of moments, the parameters may be obtained by equating the sample and the theoretical moments. In the case of NB-L(r,θ), the first two moments are required for estimating r and θ.

Maximum Likelihood Estimation (MLE):
The log likelihood function of the NB-L(r,θ) is given by: By taking partial derivatives of the log likelihood function each with respect to r and θ and by equating both partial derivates to zero, we obtain the equations: ( 1) n 0 ( 1) j 0 x r 1 (r, ) = n log x r r ( 1) n 0 ( 1) Klugman et al. (2008) showed that: Therefore, by using (18-20) we have: The solution for r in Eq. 23 may be solved numerically by using Newton-Raphson method. The required equation for the kth iteration is: The solution for θ can be obtained by inserting r in (21).

RESULTS AND DISCUSSION
The NB-L(r,θ) distribution is applied on two samples of insurance count data and the results are presented in this results.
Example 1: The data for this example was taken from Klugman et al. (2008), whereby it was collected by Dropkin in 1956-1958 and analyzed in a paper in 1959. The distributions of NB-L, Poisson and negative binomial are fitted to the data using R programming and the results are provided in Table 1. Based on the log likelihood and p-value, the NB-L distribution provides the best fit for the data. The p-value of chisquare statistics for the NB-L is 99.9%.     Example 2: The data which was obtained from Klugman et al. (2008), provides information on 9,461 automobile insurance policies whereby the number of accidents of each policy is recorded. The distributions of Poisson, negative binomial and NB-L are fitted to the data using R programming. Based on the log likelihood and p-value, the negative binomial provides a better fit compared to the Poisson. However, the p-value of chisquare statistics for the negative binomial is still considered as small. The NB-L distribution however, provides quite a significant improvement over the Poisson and the negative binomial by providing a much better fit for the data (Table 2).

CONCLUSION
In this study, a two-parameter negative binomial-Lindley distribution, NB-L(r,θ), is introduced by mixing the distributions of negative binomial (r,p) and Lindley (θ), where the reparameterization of p = exp(-λ) is considered. In particular, the closed form and the factorial moment of NB-L(r,θ) are derived. In addition, the parameters estimation of NB-L(r,θ) via the method of moments (MME) and the Maximum Likelihood Estimation (MLE) are shown. The application of NB-L(r,θ) is carried out on two samples of insurance data. Based on the results, it is shown that NB-L(r,θ) provides a better fit compared to the Poisson and the negative binomial for count data where the probability at zero has a large value. In particular, Fig. 1-4 show that the pmf of NB-L(r,θ) has the highest mass at zero for several values of r and θ and the results of log-likelihood and p-value provided in Table 1 and 2 proved that the NB-L(r,θ) provides a better fit for the sample data compared to the distributions of Poisson and Negative Binomial.