New Families of Distributions for Modeling Bivariate Data, with Applications

Corresponding Author: Osama Mohareb Khaled Department of Math, Faculty of Science, Port said University, Port Said, Egypt Email: osam87@yahoo.com Abstract: In this study we introduce a new method of adding two shape parameters to any baseline bivariate distribution function (df) to get a more flexible family of bivariate df's. Through the additional parameters we can fully control the type of the resulting family. This method is applied to yield a new two-parameter extension of the bivariate standard normal distribution, denoted by BSSN. The statistical properties of the BSSN family are studied. Moreover, via a mixture of the BSSN family and the standard bivariate logistic df, we get a more capable family, denoted by FBSSN. Theoretically, each of the marginals of the FBSSN contains all the possible types of df's with respect to the signs of skewness and excess kurtosis. In addition, each possesses very wide range of the indices of skewness and kurtosis. Finally, we compare the families BSSN and FBSSN with some important competitors (i.e., some generalized families of bivariate df's) via real data examples. AMS 2010 Subject Classification: 62-07; 62E10; 62F99.


Introduction
Multivariate data is obtained when we observe more than one statistical outcome variable at a time. Therefore, it consists of two or more observed random variables (rv's) that are usually correlated. The most appropriate and commonplace multivariate distribution that can describe the multivariate data is the multivariate normal distribution. Most likely, the popularity of normal distribution is due to the ease of simulation and the possibility of deriving closed-form theoretical results. It is rare, however, that the multivariate normal distribution can describe the given multivariate data. Virtually, most of the obtainable multivariate data are skewed and heavy tailed. So we are in bad need to nonnormal multivariate df's to describe such data. Examples of random phenomenona that are governed by non normal multivariate df's would be lengths of time among malfunctions in machinery, waiting times and growth data such as bacterial growth.
Actually, the non-normal multivariate df's are needed in several cases. First, when the marginal df's are not normal. Second, when the contour of the constant density is not ellipse. Third, the conditional expectations are not linear regressions. Last, the variances and covariances of conditional df's are dependent of the values of the conditioning variables. Although the roots of multivariate distribution theory lie in univariate distribution theory, the extension to the multivariate domain introduces additional concepts and issues of particular relevance.
Despite the paucity of studies tackling the generalized families of multivariate df's in comparison to the studies, that tackle the generalized families of univariate df's, many authors have considered and studied the multivariate df's that describe non-normal multivariate data. Among those authors are Cook and Johnson (1981;1986). They found that non-normal df's with normal margins would be appropriate for some bivariate geological data. Early work goes back to Edgeworth (1896), where families of bivariate distributions are derived as series involving the bivariate probability density function (pdf) and all its partial derivatives as well as the cumulants of the distribution (cf. Mardia, 1970). In our study, we mainly focus on studying the bivariate models. Generally, a bivariate df is better suited for practical use when it has the following properties: • The bivariate df is mathematically tractable, i.e., we can derive its distributional characterizations, e.g., moments, cross moments, median vector, mode vector, marginal skewness and kurtosis. Moreover, via the parameters of the bivariate df, we can control some of its characterizations • The bivariate df is flexible to model various types of data by possessing a wide range of the indices of the marginal skewness, the marginal kurtosis and the correlation coefficient • The bivariate df is motivated by important applications In this study we suggest a method of adding two parameters to any base bivariate df to obtain a new generalized family of bivariate df's, which is more flexible to fit bivariate data. This new method is based on the mixture of the df's of some random vectors (X 1 , X 2 ) and (-X 1 , -X 2 ). We apply this method on the bivariate standard normal distribution, denoted by BN, to yield an extension of the bivariate normal distribution, which can serve as a strong competitor of many known non-normal bivariate distributions. For example, the new extension of the bivariate normal distribution contains, beside the normal df itself, positively-negatively asymmetric and leptokurticplatykurtic marginals. Moreover, through the added two parameters, one can fully control the skewness, kurtosis of the marginals of this family, as well as its correlation coefficient. Briey, the suggested new family of bivariate df's satisfies the above mentioned preferable properties 1-3.
The rest of the paper unfolds as follows. Section 2 briey outlines two important nonnormal bivariate df's. In section 3, we introduce and study a new suggested family of bivariate normal df, denoted by BSSN. Moreover, we show that the BSSN family is capable of describing many types of statistical bivariate data. Finally, we suggest a more competent family, which is an extension of the BSSN family, denoted by FBSSN. Theoretically, each of the marginals of the FBSSN family contains all the possible types of df's with respect to the signs of skewness and excess kurtosis and possesses very wide range of the indices of skewness and kurtosis. In section 4, the AIC and BIC criteria are used to compare the families BSSN, FBSSN, Bivariate Tukey g-h (denoted by BT-g-h), Bivariate Skew Normal (denoted by BSN), Bivariate Mixture Normal (denoted by BMN) and the usual Bivariate Normal df (denoted by BN) by means of three real bivariate data sets, concerning the air pollution. Section 5, is devoted to a brief discussion.

Two Important Non-Normal Bivariate df's
In this section we review the bivariate g-and-h df, which is introduced by Kowalchuk and Headrick (2010) and the bivariate skew normal df (Azzalini, 2005).

Bivariate g-and-h Distribution
The g-and-h univariate distribution was suggested by Tukey (1977) and discussed by Hoaglin and Peters (1979) and Hoaglin (1983). This distribution is defined by transforming the standard normal variable Z to: where, g is a real constant controlling the skewness and h is a nonnegative real constant controlling the kurtosis, or elongation (for the definition of g-and-h distribution when h ∈ ℝ, see Martinez and Iglewicz, 1984). It can be shown that T 0,0 (Z) = Z is the normal one, the distribution of T 0,h (Z) has heavier tails than normal with the location-scale log-normal distribution. The sign of g controls the direction of skewness but not the amount. Positive values of g skew the distribution to right tail while negative values of g skew the distribution to the left tail. Therefore, this family of df's encompasses a remarkable variety of distribution types. Moreover, since T g,h (z) increases monotonically in z, the df of X can be written as F X (x) = ( ) where Φ(.) denotes the standard normal df. Clearly, this distribution is not easily mathematically tractable, because it does not have simple expression for density. However, Martinez (1981) derives the nth moment about the origin, when g ≠ 0 and 1 0 h n ≤ < , as: A common approach for generating correlated g-andh distributions uses the Equation 2.1 with bivariate normal data with a specified correlation structure (see Wilcox, 1995;Wilcox and Keselman, 2001;Wilcox et al., 2000). Namely, a random vector 2 X ∈ ℝ is said to have a standard bivariate g-and-h distribution,

Bivariate Skew Normal Distribution
The term Skew-Normal (SN) df refers to a parametric class of df's which includes the standard normal as a special case. A rv Y is said to be skew-normal with parameter λ, written Y∼SN(λ), if its density function (cf. Azzalini, 1985;Gupta et al., 2002) is: where, φ(.) denotes the standard normal density function. The parameter λ regulates the skewness and λ = 0 corresponds to standard normal df. It is worth mentioning that the rv Y itself may be interpreted as the following transformation of normal rv's: where Z 1 and Z 2 are two independent rv's following the standard normal distribution. The advantage of this distribution family is that it persists many statistical properties of the normal distribution, in particular, Y 2 ∼ 2 1 χ . The SN df enables us getting rid of the assumption that the underlying population distribution is symmetric. Such an extension is necessary because in practice, the underlying distribution may be well skew rather than being symmetric.
When applying the SN df in statistical inference, we frequently need to study the joint df of a random sample from the population. Therefore, the study of multivariate SN df is of a considerable interest. Azzalini (1985) introduced the following bivariate extension of the density in (2.3). A random vector 2 Y ∈ ℝ follows a bivariate SN, denoted BSN 1 , if the joint density of Y has the following form: where, λ 1 , 2 λ ∈ ℝ, c is the normalizing constant and ( ) ; y φ Ω is the density of a bivariate normal distribution generally not the correlation matrix of Y . Clearly, the df defined in (2.4) is just a direct and formal extension of the univariate SN distribution defined in (2.3). In Azzalini and Dalla Valle (1996) pointed out the disadvantage of the model defined in (2.4) and suggested the following version for bivariate SN distribution, denoted by BSN 2 . A random vector 2 V ∈ ℝ follows BSN 2 distribution if the joint density of V has the following form: (2.5) Azzalini and Dalla Valle (1996) and Azzalini and Capitanio (1999) showed that the model (2.5) has some interesting statistical properties such as the quadratic form of a skew normal random vector is 2 2 χ , along with a probabilistic interpretation on the basis of the stochastic representation. However, in the definition is Ω not the correlation matrix of V. Moreover, Ω = I (where I is the identity matrix of order 2×2) does not imply that all components of V are independent. In a series of articles, Gupta and Chen (2001;2004) pointed that neither of the bivariate skew-normal models (2.4, 2.5) cohere with the joint distribution of a random sample from a univariate skew-normal distribution. They introduced an alternative bivariate skew-normal model which overcomes this problem.

Bivariate SS-Normal Distribution and the Full Family
It is known that the mixture model, which is a convex combination of two pdf's is a powerful and flexible tool of modeling complex data because it combines the properties of the individual densities. In Barakat (2015) suggested a univariate generalized distribution family, which is simpler than the most well-known families. This is because it is a mixture of a baseline distribution and its reverse (i.e., the distribution of the negative rv), after adding a location parameter to the baseline distribution and adding the same location parameter with a different sign to the reverse of the baseline distribution. By this way the utilized location parameter turns to a shape parameter and we get a two-parameter generalized distribution family. This family has an individual trait, on which it has been built, that if this family contained any distribution, it should contain also its reverse distribution. This trait makes us name this family as the stable. Barakat (2015) suggested a new univariate generalized family, denoted by SSN, by taking the standard normal distribution as the baseline distribution of the stable symmetric family. Barakat (2015) showed that the SSN family contains all the possible types of df's (there are nine possible types, see Barakat, 2015. Moreover, any family that contains these nine types is called full family, cf. Barakat and Khaled, 2017), except the type "symmetric and leptokurtic", denoted by 0+; i.e., positive excess kurtosis or the df has a more acute peak around the mean and fatter tails than normal df. Moreover, the SSN family has a very remarkable wide range of the indices of skewness and kurtosis. Therefore, it is capable of describing more types of statistical data than many other known families.
Bivariate SS-normal distribution Barakat's method (2015) can be applied on any baseline bivariate df to get the following bivariate stable symmetric family: The survivor function of the bivariate df (3.1) is given by: Let us now apply (3.2) on the standard bivariate normal distribution Φ ρ (x), with correlation coefficient ρ. Then, a new skew bivariate normal distribution, denoted by BSSN, or W = (W 1 , W 2 ) ∼ BSSN, is defined by: The moment generating function can be easily derived by using (3.3) in the form: where: The following lemma, which is directly obtained by using (3.4), gives the marginal means, marginal variances, marginal coefficients of skewness and kurtosis of the BSSN family (3.3)

Lemma 3.1
For each of the marginals of the BSSN df, we have: Clearly, if c = 0 the BSSN family becomes the standard bivariate normal df for any α. Thus, apart from the value c = 0, Lemma 3.1 shows that the parameter α solo controls the marginal kurtosis of the BSSN family. On the other hand, both the parameters α and c control the marginal skewness of the BSSN family in the following manner: Each of the marginals of BSSN family has the type: , . .,00, 0, 1 , . .,0 , , 2 3 3 , . ., 0, , 0 It is worth noting that, each of the marginals of BSSN df has eight types out of the nine possible types of df's. Namely, the only excluded type from the BSSN family is the type 0+. Moreover, in view of the results of Barakat (2015) the BSSN family possesses a very wide range of the indices of marginal skewness and kurtosis. The following interesting theorem shows that the BSSN family is not only tractable but we can easily control its correlation coefficient.

Theorem 3.1
The correlation coefficient, denoted by ρ W , of the BSSN family is given by: where, c 1 = µ 1 -cσ 1 , c ★ 1 = µ 1 + cσ 1 , c 2 = µ 2 −cσ 2 and c ★ 2 = µ 2 + cσ 2 . Moreover, in any fitting problem we should get the estimates and 2 σ . Clearly, the estimation problem for these parameters is strongly relevant to the problem of estimating mixtures of bivariate normal distributions. Actually, many efficient methods were discussed by many authors, among them we mention Quandt and Ramsey (1978) and Lin et al. (2007). Barakat and Khaled (2017) called any generalized family that contains all the possible types of df's (nine types), a full family. As we have seen before the BSSN family has eight marginal types out of the nine possible types of df's. Namely, the only excluded marginal type from the BSSN family is the type 0+. Since the mixture operation is linear with respect to all the moments, we can suggest a full bivariate family of df's by mixing the BSSN family with any tractable symmetric leptokurtic df, namely the standard bivariate logistic df suggested by Gumbel (1961). The standard bivariate logistic df is defined by: Therefore, given three parameters 0 ≤ α, β ≤ 1 and c ∈ ℝ, the full family of df's is given by:

Further Extension for BSSN Family
The moment generating function can be easily derived by using (3.8) in the form: , 1,2, ) and: Note that, in (3.10) if β = 1, then ρ η = ρ W . Moreover, Clearly, each marginal of the FBSSN family possesses a very wide range of the indices of skewness and kurtosis (at least not less than the BSSN family). Moreover, each marginal of the FBSSN family contains all the possible types of df's. Therefore, the FBSSN family provides a general and flexible mechanism for fitting a wide spectrum of real world bivariate data set.

Modeling of Air Pollution
Air pollution is a very important environmental and social issue. In addition, it is a complex problem posing multiple challenges in terms of management and reduction of harmful pollutants. Air pollutants are released from anthropogenic and natural sources.
In this section we consider three daily data sets for air pollution from the London Air Quality Network (LAQN), from 01/01/2010 to 31/12/2016. This network is a united resource for air pollution measurements, which are fundamental to support air quality administration. The LAQN was framed in 1993 to arrange and enhance air pollution checking in London. People living in the city of London can check the updates of the level of air pollution data through the web site "www:londonair:org:uk". We consider the following three pollutants: The Nitric Oxide (NO), the Nitrogen Dioxide (NO 2 ) and the Sulphur Dioxide (SO 2 ). These data sets can be downloaded by any researcher in the form of a report every half hour, every hour or every day according to the type of study from the following site "www.londonair.org.uk/london/asp/datadownload.asp ". The summary statistics for these data sets are given in Table 1.
In this study, we compare the location-scale BSSN and FBSSN families (3.6) and (3.11) with the location-scale BT-g-h family (where σ > 0 and the rv X is defined by (2.2)), the location-scale BSN2 family , (where its pdf is defined by (2.5)), the location-scale bivariate mixture normal df ) and the location-scale bivariate normal df BN(z; ρ, µ, σ) = . The estimates of the parameters of these families are calculated using the R package as follows: For BT-g-h we use ghyp package, for BSN2 we use sn package, for BN families we use stats package, for BSSN, BMN families we use maxLik package and for FBSSN we use maxLik and VGAM packages. These estimates are summarized in Tables 2-4. To compare the fittings of these families we used Akaike Information Criterion (AIC) (Akaike, 1973) and the Bayesian Information Criterion (BIC) (Schwarz, 1978). These criteria are based on the likelihood value of the model, the number of observations and the number of parameters thereof. The comparison results are summarized in Table 5-7. These tables show that the FBSSN family has the smallest values of the AIC and BIC criteria in the three tables, followed by the BSSN family. On the contrary, the families BT-g-h and BSN 2 have the largest values of those criteria.             Therefore, the best model that fits these data sets is the FBSSN family and the second model is the BSSN family. On the other hand, the worst two models that fit these data sets are BT-g-h and BSN 2 families.

Concluding Remark
In this study, we adhered to the most agreed upon theoretical conditions that make any family of bivariate df's to be capable of fitting many different types of bivariate data. According to these theoretical conditions we constructed the BSSN family and extended it to the FBSSN family, which is more a capable family to describe many bivariate data sets of different types. We applied the constructed families with other popular families on three bivariate data sets concerning three air pollutants. We used the AIC and BIC criteria to compare the fittings of these families. The result of the comparisons shows that the FBSSN family is the best followed by the BSSN family.