Generalized Statistical Inference for Quantiles of Two-Parameter Gamma Distribution

: Gamma distribution is a widely used distribution to analyze data in many disciplines such as hydrology, meteorology, environmental monitoring, lifetime testing and reliability. In this study, we look at the statistical inference for the quantiles of two-parameter gamma distribution. The testing and estimation of gamma quantile are required especially in areas such as flood frequency analysis and life testing. For this problem, all the statistical inference methods available in the statistical literature are approximate methods. In this study, we propose two methods to tackle this problem. The first method is an exact statistical inference procedure utilizing the generalized p-value technique. The procedure is exact in the sense that it is based on exact probability statements rather than based on approximations. The second procedure is based on the parametric bootstrap approach. We apply the proposed methods to several examples with real data sets and compare the results with other existing methods. A limited simulation study is given to compare the performance of the proposed methods with other existing methods. Overall, according to the simulation results, in terms of size and power, these two new methods perform well over the other existing methods whether it is related to lower or higher quantiles.


Introduction
The gamma distribution is a widely used distribution for the analysis of rainfall data, pollution data, environmental monitoring data and lifetime data.In the analysis of rainfall and hydrological data, the gamma distribution or gamma-related distributions (Aksoy, 2000;Ashkar and Bobèe, 1988;Ashkar and Ouarda, 1998), such as the Pearson type 3 and the log-pearson type 3 distributions are very frequently used.In exposure and pollution data analysis, gamma models are used as an alternative to lognormal models (Maxim et al., 2006).Gibbons (1995) shows applications with groundwater and environmental monitoring data.Husak et al. (2007) showed that the gamma distribution is a very good choice for modeling rainfall data.
In these areas of research, it is often required to estimate quantiles, tolerance limits and prediction limits of a gamma distribution.In this article, we consider statistical inference related to the quantiles of twoparameter gamma distribution.
Let X  G (, ), where both  and  are unknown parameters.Here  is the shape parameter and  is the scale parameter.The parameter of interest is the q th quantile q, which is defined as Pr (X q) = q, where 0< q <1 is a known constant.Using a random sample X1, X2, …., Xn, we want to construct a Lower Confidence Limit (LCL) and Upper Confidence Limit (UCL) and test the uppertailed hypothesis (or lower-tailed hypothesis): where,  is a known constant.
There are many papers written in the existing literature, regarding the estimation and testing of the gamma parameters and functions of gamma parameters.When both parameters are unknown, with few exceptions, almost all the statistical inference procedures available in the statistical literature are approximate procedures.Engelhardt and Bain (1977) gave a uniformly most powerful test to test the scale parameter .Bain et al. (1984) provided an approximate test for testing the gamma mean .Bhaumik et al. (2009) provided tests for the shape parameter  and the mean .Krishnamoorthy and León-Novelo (2014) provided approximate small sample likelihood-based statistical inference procedures for the gamma parameters and the mean.Bain et al. (1984) proposed an approximate tolerance limit for the two-parameter gamma distribution.Their method consists of replacing the unknown shape parameter  with the maximum likelihood estimate of it and utilizing an approximation that is accurate only for small values of q of the gamma distribution.They showed that their method provides satisfactory estimates for the LCL when the value of q is lower than 0.2.Their method works well with their applications since the lower tail is more important than the upper tail in reliability and industrial engineering applications.However, this is not the case with most applications that arise in hydrological applications and flood frequency analysis (Ashkar and Ouarda, 1998), the higher values of q are often the ones under scrutiny.Therefore, they are more interested in the values of q for higher values of q.For example, in flood frequency analysis, the 100-year flood is associated with the quantile q with q = 0.99; it is often desired to get LCL for q (Ashkar and Bobée, 1988).Furthermore, these types of data are highly skewed and the gamma distribution and gamma-related distributions are frequently used to analyze the data in these areas.In these areas of research, there are many papers in the literature related to estimation q; but they all use approximate methods.Ashkar and Ouarda (1998) gave an approximate method to construct confidence limits for q using transformed data.This method is frequently used in hydrology and flood frequency analysis to construct confidence intervals for q.Their method is briefly described in the next section.By utilizing the X 1/3 transformation, Krishnamoorthy et al. (2008) proposed an approximate method for testing and estimating the parameters and functions of gamma parameters.Their method is also briefly described in the next section.Aryal et al. (2008) showed that the two-parameter gamma distribution can be approximated by a normal distribution when the shape parameter is large.They suggested using normal-based tolerance limits if the maximum likelihood estimator of the  is more than 7. Otherwise, they provided tabular values to construct tolerance factors.Weerahandi and Gamage (2016) introduced a general method to tackle statistical inference problems for a twoparameter continuous distribution.Using this method, they showed that one can get an exact test and confidence intervals for the gamma parameters and the mean using the generalized p-value approach.In this article, we utilize their approach to find an exact statistical inference procedure for the quantiles of the two-parameter gamma distribution.The test is exact in the sense that it is based on exact probability statements rather than based on approximations.Furthermore, we introduce a parametric bootstrap method to carry out statistical inference of q.

Inference for Gamma Quantiles
The minimal sufficient statistics for  and  are given .These two statistics S and P are not independent, but a statistic T defined as a ratio of the geometric mean and the arithmetic mean is independent of S (Bhaumik et al., 2009).Therefore, the joint statistic (S, T) is defined by: ( ) ( ) provides an independently distributed joint sufficient statistic for the statistical inference regarding parameters  and .In the first two subsections, we summarize two leading statistical procedures and the confidence bounds for the gamma quantiles.

Materials and Methods
Ashkar and BobÈe Method Ashkar and Ouarda (1988) introduced an approximate procedure to construct the confidence limits using transformed data.We refer to this method as the AB method.Their approximate procedure is based on the fact that X is distributed as FX -1 (FY (Y); where X is a gamma random variable with the cdf FX and Y is a normal random variable with the cdf FY.They came up with the following approximately 100(1-)% LCL and UCL for q.
where, ( ) and: ( ) Here  (.) is the standard normal cdf, zq is the q th quantile of the standard normal distribution and ta;b (c) is the b th quantile of the noncentral t-distribution with degrees of freedom a and noncentrality parameter c.A two-sided approximate 100 (1-2)% confidence interval for q is given by (LCL, UCL), where LCL and UCL are given in (3).The values of Kp can be approximated using the formula: is the coefficient of skewness of the gamma distribution, which is estimated using the maximum likelihood estimate.This approximation works well when 2   and many other approximations are given for other values.The references for these approximations and tabulated values for Kp are given in Bobee and Ashkar (1991).Aryal et al. (2008) argued that the gamma distribution can be approximated by a normal distribution when the maximum likelihood estimator of the shape parameter of the gamma distribution is larger than 7. Krishnamoorthy et al. (2008) pointed out that the normal approximation appears to not be useful for statistical inference related to gamma parameters, but it is accurate for the computation of the prediction intervals, tolerance intervals and for statistical inference regarding stress-strength reliability parameters.Krishnamoorthy et al. (2008) investigated the accuracy and appropriateness of the X 1/3 transformation suggested by Wilson and Hilferty (1931) to analyze the gamma data.They showed that the X 1/3 transformation works much better than the X 1/4 transformation which is known as the Hawkins and Wixley (1986) approximation.After transforming the data, they utilized the available normal tolerance intervals (Guttman, 1967) to carry out the inferences for the gamma data.Overall, they concluded that the X 1/3 transformation provides a simple, easy-to-use approach for dealing with various problems related to the gamma distribution.In our analysis, we use the X 1/3 transformation approach to analyze the data.From here on we will refer to the method associated with the X 1/3 transformation as the Normal Approximation (NA) method.

Normal Approximation Approach
With the normal approximation method, approximate 100 (1-)% LCL and UCL for the gamma quantile q is given by: where, Here c1 and c2 are given by: where, zq is the q th quantile of the standard normal distribution and ta,b (d) is the b th quantile of the noncentral t-distribution with degrees of freedom a and noncentrality parameter d.The two quantities c1 and c2 are referred to as tolerance factors.The two-sided approximate 100 (1-2)% confident interval for q is given by (LCL, UCL), where LCL and UCL are given in (4).
Statistical inference for the tolerance limits and confidence intervals go hand in hand.To describe this, let q denote the "content" and 1- denote the coverage probability of a tolerance interval.Then the Upper Tolerance Limit (UTL) for the two-parameter gamma distribution with content q and the coverage probability 1- is given by the UCL that is given in 4.
The Lower Tolerance Limit (LTL) with content q and the coverage probability 1- can be obtained by replacing the plus sign with the minus sign in the UCL formula, i.e.,

Generalized Inference Procedure
To describe the method introduced by Weerahandi and Gamage (2016), let X1, X2,…., Xn be a random sample from a two-parameter continuous density ( ; , ) X f x  with a joint minimal sufficient statistics (S, T).Both parameters are assumed to be unknown and let be  the parameter of interest.Let FT (t; , ) = Pr(T  t) be the cdf of T. Then U = FT (T) has a uniform distribution over 0 and 1 and ( ; )  and T. At the observed value t of T, if u (t) is the value of u, then u (t) = Ft (t; , ) For a fixed value of t, this is a function of  and when treated as a function of  , u () = FT (t; , ).Let u -1 be its inverse function satisfying the equation u -1 (u()) = .Now, define the random variable Rb (T; , , t) = u -1 (U (T)), which satisfies the following two properties: 1.At the observed value t of T, Rb (t;, , t) =  and 2. The distribution of Rb is free of the parameter  The random quantity Rb (T; , , t), at times, is denoted as ˆ (U )  since this is the quantity that is used to replace  the construction of the generalized pivotal quantity.Now, let / ( ; , )

S T t
Fs  = be the cdf of the conditional distribution of S given T = t.Let us denote it by W (S, t) = FS/T = t (S) and it is distributed as W  U (0, 1).This conditional distribution does not depend on t.Therefore, the unconditional distribution of W (S, T) is U (0, 1) and it is independent of T.
For the testing and estimation , Weerahandi and Gamage (2016) defined a Generalized Test Variable (GTV) and a Generalized Pivotal Quantity (GPQ) using the random quantity: where, W and U are independent uniform U  U (0, 1) random variables.
If the cdf of R is monotonically increasing with respect to the parameter of interest , then the hypothesis 00 : H   can be tested using the p-value: , where, the expected value is taken with respect to the uniform U (0, 1) random variable.

Generalized Inference Procedure for Gamma Quantiles
Now let us consider the statistical inference for the q th quantile q of the gamma distribution G(, ); where both  and  are unknown parameters.Here  is the shape parameter and  the scale parameter.The joint statistics (S, T) defined in ( 2) is an independently distributed complete minimal sufficient statistic for  and  .
Let F,  (.) Denote the cdf of the gamma G(, ) distribution.Then F,  (q) = q and: 0 () The parameter of interest q = q (, ) can be written as: is the inverse cumulative distribution function of G(, 1) i.e., the q th quantile of the gamma distribution G(, 1).
Since q is the parameter of interest, let us reparametrize the problem with parameters q and ; treating  as the nuisance parameter.Now X  G (, q/) and S  G(n, q/).
Furthermore, the statistic T is scale-invariant and the distribution of T does not depend on  , it only depends on  .Therefore, we can use it to estimate the nuisance parameter  .Following the technique introduced by Weerahandi and Gamage (2016), let FT (t; ) = Pr (T  t) be the cdf of T. Then U = FT(T; ) has a uniform distribution over 0 and 1 and U = U(T; ) is a function of T and .In general, U could be a function of T and both unknown parameters  and .At the observed value t of For a fixed value of t; this is a function of  and when treated as a function of , u() = FT(t; ), Let u -1 be its inverse function satisfying the equation u -1 (u()) = .Now, define the random variable Rb(T; , t) = u -1 (u(T)), which satisfies the following two properties: , ( ; , ) .As before, if we denote it by () , W is distributed as U(0,1).Since S and T are independent, W (S, T) and T are independent.Now we can de ne a generalized pivotal quantity for as: where, ( , ; , ) q W W S T  = and U = U (T) are independent uniform U (0, 1) random variables.The value of R is equal to one at the observed values s of S and t of T. Furthermore, the distribution of R is free of the nuisance parameter  and therefore R is a generalized pivotal quantity for q.
The cdf of the R is monotonically non-decreasing with respect to the parameterq and the hypothesis H0 :q   against H1 :  >  can be tested using the p-value.
( ) ( ) where, the expected value is taken with respect to the uniform U (0, 1) random variable.
Confidence intervals and tolerance limits can be constructed using the generalized pivotal quantity R. The R depends only on q and any percentile of the distribution of R is only a function of q.Furthermore, the generalized pivotal quantity R is a monotonically non-decreasing function of q.Therefore, the LCL for a 100 (1-)% upper confidence interval for q would be the  th percentile of the distribution is equal to one.Similarly, the UCL for a 100(1-)% lower confidence interval for q would be the the (1-) th percentile of the distribution is equal to 1.Moreover, the interval (LCL, UCL) is a two-sided 100 (1-2)% confidence interval for q.The computations of LCL and UCL can be carried out with simulation by generating a large number of uniform random numbers from U and W.
The Upper Tolerance Limit (UTL) for the gamma distribution with content q and the coverage probability 1- is equal to the UCL stated above.The Lower Tolerance Limit (LTL) with content q and the coverage probability 1- can be obtained by constructing 100 (1-)% LCL for the (1-q) th quantile 1-q.

A Parametric Bootstrap Approach
In this section, we outline another approach utilizing the parametric bootstrap approach.With the parametric bootstrap, inference is done by generating random samples from an estimated model, which is achieved by replacing the unknown parameters with their estimates.In order to establish the estimated model, we utilize the maximum likelihood estimates of  and .
For gamma quantiles, confidence limits can be established as follows: Step 1: For sample data x1, x2,…., xn, first compute the maximum likelihood estimates of ,  and q.
Let; ,  and q  be those maximum likelihood estimates respectively Step 2: Generate a large number of bootstrap samples (say N = 5000) from the gamma distribution with the shape parameter  and the scale parameter


. For each of these samples, find the maximum likelihood estimate of q and let q ,i  (i = 1, 2, N) be those estimates Step 3: The approximate 100(1 )% − Lower Confidence Limit (LCL) for q is given by the th  percentile of the estimates 1,2,..., With respect to the tolerance limits, the Upper Tolerance Limit (UTL) for the gamma distribution with content q and the coverage probability 1- is given by the UCL as described above.The Lower Tolerance Limit (LTL) with content q and the coverage probability 1- can be obtained by evaluating the 100 (1-)% LCL for the (1-q) th quantile.
For the hypothesis testing, the upper-tailed hypothesis given in (1) can be tested at the  level by rejecting H0 if the (1-) t percentile of the estimates , ( 1,2,..., ) qi ˆiN  = is larger than the null hypothesized  value.Similarly, the lower-tailed hypothesis can be tested at the  level by rejecting H0 if the  th percentile of the estimates , ( 1,2,..., ) qi ˆiN  = is less than the null hypothesis  value.However, since the hypothesis testing is done under the assumption of null hypothesis, one can obtain a better parametric bootstrap procedure for testing as follows.
We can reparametrize the problem in terms of (, q  ) instead of (); where q is the q th quantile.For given values of q and ; let  *(, q  ) be the solution of  for Eq. ( 5).
This solution has to be evaluated numerically.By replacing  with  *(, q  ); the likelihood function can be written as: where, S and P are the sufficient statistics given in (2).Under the null hypothesis in (1), assume the worst possible value q  which is q = .Then the likelihood function has only one unknown parameter.The maximum likelihood estimate of this unknown parameter  say m can be obtained by maximizing the log-likelihood function: Here *  represents * (, ).
Now bootstrapping can be done by generating a large number of datasets from the gamma distribution with the shape parameter *(, q  ) and the scale parameter.
Testing the hypothesis given in (1) can be done as follows.
Proposed parametric bootstrap testing procedure (referred to as PB method): Step 1: For sample data x1, x2,…., xn, compute the maximum likelihood estimate of  and  the maximum likelihood estimate of q  , q  Step 2: Under the null hypothesis (1), using the sample data x1, x2,…, xn, compute m  by maximizing (6) and then compute the corresponding shape parameter  .For each of these samples and find the maximum likelihood estimate q; as outlined in step 1.Let ( 1,2,..., ) q ˆiN  = be these estimates.Compute the  th and (1-) th percentiles of the estimates ( 1,2,..., ) q ˆiN  = . Let LC be the  th percentile and UC be the(1-) th percentile respectively Step 4: For the upper-tailed hypothesis given in (1), reject the null hypothesis at -level if q  is larger than the UC.Similarly, the lower-tailed hypothesis can be tested at -level if q  is smaller than the LC Instead of reparametrizing the problem with (, q  ), one could approach the problem by reparametrizing in terms of (, q  ) and try to take the computational advantage coming from Eq. ( 5).It certainly works but runs into computational issues when the shape parameter  is large.
Some similarities exist between the Generalized Test Variable (GTV) approach and the parametric bootstrap methods.In the case of one-way anova, Weerahandi and Krishnamoorthy (2019) showed that the parametric bootstrap procedure can be derived as a GTV method by considering an appropriate test variable.In the case of two-way ANOVA, Ananda et al. (2023) showed that the parametric bootstrap procedure can be derived as a GTV method.For a complete coverage of generalized test variable approach along with applications, see the books by Weerahandi (1995;2004).

Examples
In this section, we will apply these methods to several real data sets.In hydrological studies, researchers are often interested in confidence intervals for upper quantiles values and our rest example is from this area.

Example 2. Alkalinity Concentrations in Groundwater
The 90% confidence intervals for .1 using AB, NA, GM and PB methods yield (0.092, 0.384), (0.067, 0.379).(0.084, 0.378) and (0.114, 0.430) respectively.Note that the Upper Tolerance Limits (UTL) with a content of .95 and coverage probability of 0.9 using the methods AB, NA, GM and PB are given by 96.875, 97.705, 97.812 and 93.178, respectively.Similarly, the Lower Tolerance Limits (LTL) with a content of .95 and coverage probability of 0.9 using the methods AB, NA, GM and PB are given by 28.805, 28.343, 28.180 and 30.376 respectively.

Results
In this section, we provide a limited simulation study to compare the performance of the AB, NA, GM and PB methods.
The rest simulation study is to evaluate the size performance of these methods to test the lower and upper hypotheses stated in (1).We use the following parameters and sample size configurations:  = 0.5, 1.0, 1.5, 5.0,  = 1.0, 5.0, n = 10, 20.Under each of these configurations, using 5000 simulated samples, the lower and upper hypotheses stated in (1) were tested for the quantile levels q = 0.1, 0.3, 0.5, 0.7, 0.9.For all these size calculations, the intended level is set at 0.05.The actual sizes, i.e., the simulated rejection rates were calculated and the results are reported in Tables 1-2.As explained in chapter 2, the calculations of the GM and PB methods involve simulations and each of these simulated sample sizes is kept at 5000.

Discussion
According to these simulations, when  is small and q is large, both the NA and AB methods have highly elevated type 1 error rates for the lower-sided tests.When gets large, the disadvantage diminishes with the NA method.Notice here that when  and q are both small, the NA test tends to be too conservative.Overall, with respect to type 1 error rates, the PB test performs well for all of the scenarios considered.Next, we carry out a limited simulation study to compare the powers of these four tests.The power comparison is done without adjusting the size.These tests have different type 1 error rates and ideally, the sizes of these tests need to be adjusted to the same level for such a comparison.Adjusting the size to the same level is a time-consuming task and at this point, it is beyond the scope of our article.
We select n = 10,  = 5.0 to run the power comparison.For  and q, the choices  = 0.5, 1.0, 1.5, 5.0 and q = 0.1, 0.3, 0.5, 0.7, 0.9 were selected.The power comparison is done for the lower and upper-sided hypotheses stated in (1).Furthermore, in order to display the selected choices for  that species in (1), we introduce the ratio c = q.For the given values of c and q, the values of  were selected.The simulation was done under these parameter configurations and the rejection rates are reported in Tables 3-6.
When c = 1, by design the null hypothesis is true and the rejection rates are the actual size (type 1 error rates) for the test.When c>1, the alternative hypothesis is true and the choice is associated with a lower-tailed test and the rejection rate is the power of the test for that choice.Similarly, when c<1, the rejection rate is the power of the upper-tailed test associated with that choice.Tables 3-6 correspond to the choices of  = 0.5, 1.0, 1.5, 5.0, respectively.
According to these limited power simulation results, the following performance patterns were observed.

For the Lower Tailed Tests
When  and q are small, both the GM and PB methods perform well.In these cases, in terms of the size, both the NA and AB tests are too conservative.When  gets larger, the size disadvantage goes away with the NA method.
When  small and q is large, the power of the NA and AB tests is higher than the GM and PB methods.However, in these cases, the NA and AB methods have highly elevated type 1 error rates.When  gets larger, the size disadvantage goes away with the NA method.

For the Upper Tailed Tests
When  and p are small, the AB test possesses good power performance.However, in this case, the AB test has an elevated type 1 error rate.Even with large  values, the AB test possesses elevated type 1-error rates.When  and p are small, the NA method performs poorly.But when  gets larger, the NA method's performance becomes better.
When  small and p is large, the performance of all methods is somewhat similar.

Conclusion
Statistical inference on gamma quantiles has very useful applications in many disciplines such as hydrology, environmental monitoring and life testing.However, all of the statistical inference procedures available in the statistical literature are approximate procedures due to the complexity of the structure of the gamma distribution.According to the simulation results, in some cases, the type 1 error rates of the approximate procedures are either too conservative or too liberal depending on whether it is related to the low or high quantiles.In this study, we introduced two new methods, one utilizing the generalized p-value technique (GM) and the other utilizing the Parametric Bootstrap technique (PB).Both of these new procedures perform well over the entire range of the quantiles q, 0< q <1.Overall, by taking into account both size and power performance, the GM method and the PB method perform better than the other two procedures whether it involves lower or higher quantile values or whether it involves the lower or uppertailed tests.However, the PB method has advantages over the GM method, due to its computational simplicity.
the standardized gamma variate with the coefficient of skewness .The coefficient of skewness for the gamma distribution 2 /  = is estimated using the maximum likelihood estimate of .The values of p1 and p2 are given by:
The distribution of Rb is free of the parameter The quantity (T; , ) b Rt  will be used to replace  the construction of the generalized pivotal quantity and therefore let us denote it by( ) (UCL) for q is given by the (1 ) th − percentile of the estimates , ( 1,2,..., ) Generate a large number of bootstrap samples (say N = 5000) from the gamma distribution with the shape parameter *( m  ,) and the scale parameter m

Table 2 :
Type I error performance of competing tests when  = 5