A New Family: Σh Distributions

Corresponding Author: Yasser Mohamed Amer Department of Statistics and Insurance, Cairo Higher Institutes in Mokattam, Cairo, Egypt Email:yasseramer4@yahoo.com Abstract: In this study, a new method is proposed for generating families of the sum of the hazard functions for two distributions named the Σh distributions. This new family will help in the application of a wider range of life time data. Many new distributions, which are members of the family, are presented with emphasis on the Σh Exponential-Lomax distribution. Details and various statistical properties have been introduced. The maximum likelihood estimation for parameters of the Σh ExponentialLomax distribution has also been discussed alongside Monte Carlo simulation study to assess the accuracy and the performance of the estimation procedure. Finally, the Σh Exponential-Lomax distribution has been fitted to a real data set to provide variability of its applicability.


Introduction
Probability distributions have been popularly used in many areas of the real world situations. These standard probability distributions have been used in statistical practice for a long time. However, in many practical areas, the classical distributions do not provide an adequate fit in modeling data. Which creates a clear need for the extended version of these classical distributions? Serious attempts have been made in this regard to propose new families of distributions that extend the existing well-known distributions. Among these scientists who helped develop these distributions are, (Azzalini, 1985) who proposed a method of obtaining weighted distributions from independently identically distributed (i.i.d.) random variables. Gupta et al. (1998) proposed to model failure time data by F * (f) = [F(t)]θ where F(t) is the baseline distribution function and θ is a positive real number. This model gives rise to monotonic as well as non-monotonic failure rates even though the baseline failure rate is monotonic. Later on this trend has attracted several authors; see for example (Gupta and Kundu, 1999;, among others. Eugene et al. (2002) have introduced a general class of distributions called the Beta-G distributions which extends the distribution of order statistics. This distribution generated from the logit of the beta random variable and provides great flexibility in modeling not only symmetric heavy-tailed distributions, but also skewed and bimodal distributions. Another family of distributions, known as Kum-G distributions, has been proposed by (Cordeiro and Castro, 2011) by using escribe a new family of generalized distributions (denoted with the prefix "Kw") to extend the normal, Weibull, gamma, Gumbel, inverse Gaussian distributions, among several well-known distributions. Another method of generating families of distributions has been proposed by (Shaw and Buckley, 2007), which used a quadratic transmutation map to generate new probability distributions using any baseline distribution (Gupta and Kundu, 2009;Shahbaz et al., 2010). Alzaatreh et al. (2013), have proposed a general method of extending probability distributions using the technique that generates the T-X family, one can develop new distributions that may be very general and flexible or for fitting specific types of data distributions such as highly left-tailed (right-tailed, thin-tailed, or heavytailed) distribution as well as bimodal distributions. The Beta-G, Kum-G and T-X families of distributions use some baseline distribution.
Two of the functions to characterize the distribution of T: The survival function, which is the probability of an individual surviving beyond time t and the hazard rate, which is approximately the chance an individual of age t experiences the event in the next instant in time. In reliability, the survival probability is the proportion of units that survive beyond a specified time. These estimates of survival probabilities are frequently referred to as reliability estimates. The survival function gives the probability that a subject will survive past time t, S(t) = P (T >t) = 1-F(t).
For each time interval, survival probability is calculated as the number of subjects surviving divided by the number of patients at risk. Subjects who have died, dropped out, or move out are not counted as "at risk" i.e., subjects who are lost are considered "censored" and are not counted in the denominator.
The hazard function "h" has roots in many fields and used to determine the onset or relapse of a disease in the bio-statistics, the time until a person becomes employed in economics, the time until a device fails in reliability engineering and the time to death in actuarial science, among many other fields and uses.
In this study, we developed the hazard function "h" of the distributions to find that we can have the sum of the hazard functions of two distributions by adding the hazard function of the first distribution to the hazard function of the second distribution. This is to improve the characteristics and flexibility of the existing distributions and to introduce the extended version of the baseline distribution having closed form of the hazard function.
This proposed family set out on the formation of new distributions by incorporating n distributions together. This new distribution achieves that the hazard function "h" of the emerging distribution is the sum of the functions of the hazard distributions used in the distribution configuration and this will call the name of the new family Σh distributions.
The paper is structured as follows. The new family of Σh distributions is introduced in section 2. Some examples related to the Σh distributions are presented in section 3. The Σh Lomax-exponential distribution has been explored in detail in section 4. Section 5 describes the expressions for moments, quantiles, reliability function and random number generation for the proposed Σh Lomaxexponential distribution. In section 6, we have presented estimation of the parameters and a real life application along with simulation study is given in section 7. Finally, in section 8, some concluding remarks are given.

A New Family Σh Distributions
Assume we have n variables and each variable has probability distribution function f k (x), where k = 1,…,n and x > 0. The survival function, which indicates the probability that the event of interest has not yet occurred by time t, for each x is S k (x).

Theorem 2.1
The probability distribution function, pdf, of the Σh distribution is given by: Since the cumulative distribution function, cdf, will be: where, h k (x) is the hazard rate function of distribution number k and given as:

Proof
The proof is simple.

The Hazard Rate Function
Using "Equation 1" and "Equation 2", the hazard rate function of Σh distribution is:

Proposition 1
For the pdf of the Σh distributions, we have: It is worth noting that if the value of ( )

Proposition 2
The residual lifetime R(t) of Σh distribution given as:

Special Case
If the distributions f k (x), k = 1,…,n are all from the exponential family, then:

Proposition 3
The reverse residual lifetime ( ) R t of Σh distribution given as:

Special Case
The same for the exponential distributions:

Proposition 4: Quantile and Median
The quantile of the Σh distribution is obtained by solving the following equation, with respect to x q : Thus the quantile of the Σh k distributions can be obtained as the following equation: The value of the median can be found from 'Equation 12" by setting the value of q = 0.5.

Special Sub Distributions of Σh Distributions
In this section, some special sub-distributions from this new family are introduced. Assume the number of distributions in the family, there are two. It means for n = 2.

Σh Weibull-Lomax Distribution
Suppose that f 1 (x)∼Weibull(θ, a) and f 2 (x)∼Lomax(α, β), then the Σh Weibull-Lomax distribution has cdf: The density function of Σh Weibull-Lomax distribution can be written as: and the hazard rate function of Σh Weibull-Lomax distribution is:

Σh Weibull-Exponential Distribution
Suppose that, f 1 (x)∼Weibull(θ, β) and f 2 (x)∼exp(α), then the Σh Weibull-Exponential distribution has cdf: The density function of Σh Weibull-Exponential distribution can be written as: and the hazard rate function of Σh Weibull-Exponential distribution is: We have noticed that the linear failure rate distribution is a special case of Weibull-Exponential distribution when where the cdf is as follows: and f 2 (x)∼Lomax(α, β), then the Σh Gompertz-Lomax distribution has cdf: The density function of Σh Gompertz-Lomax distribution can be written as: and the hazard rate function of Σh Gompertz-Lomax distribution is:

Σh Lindley-Lomax Distribution
Suppose that f 1 (x)∼Lindley(θ) and f 2 (x)∼Lomax(α, β), then the Σh Lindley-Lomax distribution has cdf: The density function of Σh Lindley-Lomax distribution can be written as: and the hazard rate function of Σh Lindley-Lomax distribution is: Likewise, we can generate many more flexible distributions for number of distributions more than two, n > 2. It is notable that as the value of n increased, the more complex distributions and more parameters.
In the next part of this paper we will introduce a detailed analysis of one of the family's distributions, this is Σh Exponential-Lomax distribution. We will study the distribution and its characteristics in an in-depth analytical manner, estimate the parameters of the distribution using the maximum likelihood method and introduce a simulation to study the characteristics of the capabilities. Also, we will apply to real data samples and compare the results with the Lomax distribution and the exponential distribution to show the good of fit of the new distribution.

Σh Exponential-Lomax Distribution
In this section, we introduce and study the Σh Exponential-Lomax distribution (Σh EL). The pdf, cdf, reliability and hazard rate functions are defined.
Suppose that f 1 (x)∼Exponential(θ) and f 2 (x)∼Lomax(α, β), then the Σh Exponential-Lomax distribution has cdf: The density function of Σh k EL Distribution can be written as: The hazard rate function of Σh EL is given by:     For the pdf of the Σh EL, we have: This means that the probability distribution curve touches the y axis at the value θ + αβ and then decreases continuously and this is illustrated in " Fig. 1".

Proposition 4.2
For the hazard rate function of the Σh EL (α, θ, β), we have: This means that the hazard curve touches the y axis at the value θ + αβ and then decreases until it reaches the value θ and then it is fixed and does not change by increasing the value of x, like shown in " Fig. 3".
If θ = 0 is placed in the pdf in "Equation 27" we obtain the Lomax distribution as a special case of the distribution, also if α = 0 is placed in the distribution function in "Equation 27" we get the exponential distribution as a special case of the distribution.

Statistical Properties
In this section, we have discussed some distributional properties of the Σh EL (α, θ, β) given in "Equation 27". These properties include expressions for quantile, moment, moment generating function, characteristic function and entropy. These properties are discussed in the following subsections.

Quantile, Median and Mode
The quantile of the Σh EL (α, θ, β) is obtained by solving p(X ≤ x q ) = q, 0 < q < 1, with respect to x q .
Thus the quantile of the Σh EL (α, θ, β) can be obtained as a nonnegative solution of the following nonlinear equation: The median of the Σh EL (α, θ, β) can be obtained from equation "Equation 33" at q = 0.5.
Also, the mode of the Σh EL distribution can be obtained by deriving its pdf given in "Equation 6" with respect to x and equal it to zero. Thus the mode of the Σh EL (α, θ, β) can be obtained as a nonnegative solution of the following nonlinear equation: It is not possible to get an explicit solution of the "Equation 34" in the general case and therefore numerical methods should be used such as bisection method or fixed-point method to solve it.

The Moments
Moments are necessary and very important in any statistical analysis, especially in the applications. It can be used to study the most important features and characteristics of the distribution (e.g., tendency, dispersion, skewness and kurtosis).
The rth moments of the Σh EL (α, θ, β) is introduced by the following theorem.

Theorem 5.1
The rth moments of a random variable X ∼ Σh EL (α, θ, β) is given by: where,

Proof
The rth moment of the positive random variable X with probability density function f(x): So: The same to find I 2 :

Moment Generating Function
In this subsection we derive the moment generating function of Σh EL (α, θ, β) as infinite series expansion.

Theorem 5.2
The moment generating function M x (t) of a random variable X ∼Σh EL (α, θ, β) is given by:

Proof
The Moment Generating Function of the positive random variable X with probability density function f(x) is given by: Using series expansion of e tx , we obtain: Substituting from "Equation 35" into "Equation 40", we get M x (t) in "Equation 39".

Characteristic Function
The characteristic function is a unique function which characterize any probability distribution.

Proof
The proof is simple.

Rényi Entropy
Entropy is used to measure the variation of the uncertainty of the random variable X. If X has the probability distribution function f(⋅) Rényi entropy (Rényi, 1961) Theorem 5.4 The Rényi entropy of a random variable X∼Σh EL (α, θ, β) is given by: Proof Suppose X has the pdf in "Equation 6". Then, one can calculate: Let: from "Equation 46" in "Equation 45" we get:

Shannon Entropy
Entropy measures the uncertainty of a random variable X. Shannon (1948), defined the entropy of a random variable X.

Order Statistics
In statistics, the k th order statistic of a statistical sample is equal to its k th smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. For a sample of size n, the nth order statistic (or largest order statistic) is the maximum, that is, X (n) = max (X 1 , X 2 ,…, X n ) and the smallest order statistic) is the minimum, that is, X (n) = min (X 1 , X 2 ,…, X n ).
The sample range is the difference between the maximum and minimum. It is clearly a function of the order statistics: We know that if X (1) , X (2) ,…, X (n) denotes the order statistics of a random sample X 1 , X 2 ,…, X n from a continuous population with cdf F X (x) and pdf f X (x). Then the pdf of X (k) is given by: Since, 0 < [1-F X (x)] n-k < 1 we obtain. After applying binomial expansion, we got: for k = 1, 2,…, n. The pdf of the kth order statistic for Σh EL (α, θ, β) is given by: Therefore, the pdf of the largest order statistic X (n) is given by: and the pdf of the smallest order statistic X (1) is given by: The rth order moment of X (k) for Σh EL (α, θ, β) is obtained by using: where,

Maximum Likelihood Estimation (MLE)
Assume x 1 ,…,x n be a random sample of size n from Σh EL (α, θ, β) then the likelihood function can be written as: The MLEs of α, θ and β are obtained by maximizing "Equation 56". The derivatives of "Equation 56" "Equation 55" wrt the unknown parameters are given as: The likelihood equations are given as: gives the maximum likelihood estimator where: where: In relation to the asymptotic variance-covariance matrix of the ML estimators of the parameters, it can be approximated by numerically inverting the above Fisher's information matrix F. Thus, the approximate 100(1-γ)% two sided confidence intervals for α, θ and β can be, respectively, easily obtained by: where, Z γ is the γth upper percentile of the standard normal distribution.

Numerical Studies
In this section, an extensive Monte Carlo simulation study is carried out to assess the performance of estimation method. We have also considered a real-life dataset to investigate the applicability of the Σh EL (α, θ, β) model.

Simulation Study
A Monte Carlo simulation study is carried out for samples of sizes 20, 50, 80, 100 and 200, drawn from Σh EL (α, θ, β) distribution. The samples have been drawn for α = 2.5, θ = 0.5 and β = 0.2 and maximum likelihood estimators for the parameters α, θ and β are obtained. The procedure has been repeated for 10000 and the mean and Root of Mean Square Error (RMSE) for the estimates are computed.   The Wheaton River data " Table 2" shows the data are the exceedances of flood peaks (in m 3 /s) of the Wheaton River near Carcross in Yukon Territory, Canada. The data consist of 72 exceedances for the years 1958-1984, rounded to one decimal place. For more details about the source of the data may refer to (Akinsete et al., 2008).
In order to assess the performance of the Σh Exponential-Lomax distribution we have computed various measures for Exponential (E), Lomax (L), Transmuted Exponential (TE) and Transmuted Lomax (TL) distributions. The estimated values of parameters alongside the Standard Errors (SEs) for various distributions are given in " Table 3". Estimated pdf and cdf of the Exceedances of Wheaton River flood data are plotted over empirical density and distribution functions respectively and presented in the upper panels of " Fig. 4 and 5". " Table 4" provides the log-likelihood, Akaike's Information Criterion (AIC), corrected Akaike's Information Criterion (AICc), Bayesian Information Criterion (BIC) and the Hannan Quinn Information Criterion (HQIC). From " Table 4", we can see that the Σh Exponential-Lomax distribution is good to the data as it has smallest values of the criterion.     The measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model the new family achieves more flexibility in the process of goodness of fit when apply on real data than the goodness of fit of each distribution that make up the family and this will be evident upon application.
The goodness of fit tests are used to measure how compatible a random sample with a theoretical probability distribution function. The most popular nonparametric goodness-of-fit tests, namely; the Kolmogorov-Smirnov D n , Cramérvon Mises

Concluding Remarks
In this study, we presented a new family of distributions called the Σh distributions. This family may help in the application of a wider range of life time data. We have examined and investigate the characteristics of that family and presented a number of distributions for that sample. We also analyzed the Σh Exponential-Lomax distribution as a distribution of that family's distributions. Where we studied the properties of that distribution by calculating both generating functions, quantiles, random number generation, Rényi and Shannon entropy, the order statistics and estimated the parameters of the distribution using the maximum likelihood method. A simulation to study the characteristics of the capabilities has been introduced and as the results of that simulation, it showed that the values of the approximations to the initial values also that those values are characterized by consistency as the value of RMSE decreases with the large sample size. Also, the parameters of the model were estimated on real data samples and compare the results with the Lomax distribution and the exponential distribution to show the good of fit of the new distribution to show the proposed distribution is a better good of fit from these distributions.
We can strongly say, the basic motivation behind investigating in practice using this new family is, this family verifies that the sum hazard function of the distributions arising from it is suitable for applying to different types of hazard functions. Considering the special case (Exponential-Lomax distribution) introduced in "Equation 28" in this study; note that the resulting hazard function is the sum of the two hazard functions of both the Lomax distribution and the exponential distribution and thus this fits into three of the hazard function patterns as follows: • The fixed hazard rate, in case that Lomax parameters = 0 (the distribution will be exponential) • The decreasing hazard rate, when the parameter of the exponential = 0 (the distribution will be Lomax) Then also, the hazard rate that decrease to a certain extent increase the value of X and then be fixed at a certain value (Lomax exponential distribution).
By the previous clarification, it is proved that this family has a hazard rate that fits different types of hazard functions upon applications.
For further work, we can use this distribution with life time data under censored data. Also, other estimation methods can be used such as Bayesian or least squares.