Bayesian Approach to the Estimation of a Poisson Mean with Application to NBA Three Point Attempts

Email: patricia.williamson@case.edu Abstract: Bayesian Credible Intervals are proposed for a Poisson mean. These intervals are compared to five classical confidence intervals found in the literature. A simulation study is performed to compare the procedures using two different criteria and it is attempted to determine which of the procedures performs best for various values of the parameter and sample size. Estimation of the number of three-point shot attempts and three-point shot makes by the San Antonio Spurs is given as an example.


Introduction
During the 2011-2012, 2012-2013 and 2013-2014 regular seasons, the San Antonio Spurs were in the top three in team three point shooting percentage. Because a random phenomenon for which a count of some type may be modeled by a Poisson distribution, the number of three-point attempts and the number of three pointers made in a game are possible candidates. Because it seems reasonable to assume that these two counts are such that at most one three point attempt (or one make) can occur in a very small interval of time and the number of attempts (or the number of makes) in two equal nonoverlapping time intervals should be independent and have the same distribution, the Poisson distribution seems like an appropriate model for each of these counts.
There have been many confidence intervals proposed and compared in the literature to estimate a Poisson mean. A sample of such work are Sahai and Khurshid (1993), Schwertman and Martinez (1994), Barker (2002), Byrne and Kabaila (2005), Patil and Kulkarni (2012), Khamkong (2012) and Tanusit (2012); most of these include the familiar Wald interval, the scores interval and the Garwood interval which are considered in this paper. For six approximate intervals and the exact Garwood interval, Schwertman and Martinez (1994) produces tables for a range of observed Poisson values and several confidence levels. Barker (2002) compares a total of nine confidence intervals using coverage rate, expected length and whether or not the interval is in closed form. Byrne and Kabaila (2005) compares at least twelve approximate and exact confidence intervals using coverage rate and length; one is a short exact interval and is calculated via an algorithm and was first proposed in Kabaila and Byrne (2001). Patil and Kulkarni (2012) compares 19 confidence intervals using several criteria including coverage rate and expected length. Sahai and Khurshid (1993) gives several nice biomedical and epidemiological examples which could be modeled by the Poisson distribution. Also, they provide a review of the methodology for ten confidence intervals; each of these intervals are in the comparative study of Patil and Kulkarni (2012) including the Wald, scores and Garwood intervals. Tanusit (2012) compares seven confidence intervals using coverage rate and length; three of which adapt three common intervals replacing the typical point estimator of the Poisson mean (the sample mean) by a Bayes estimator of a Poisson mean utilizing the conjugate Gamma prior with criterion given for the choice of the prior parameters. Khamkong (2012) compares four confidence intervals using coverage rate and estimated expected length; the scores interval is considered along with a proposed adapted Wald interval which outperformed the other intervals for small mean and small to moderate sample sizes.
Assuming X has a Poisson distribution with mean θ (Poisson(θ)), note that X has an approximate normal distribution with µ = θ = σ 2 (N(θ,θ)) for large θ. This can be seen by assuming X 1 , …, X θ is a random sample from a Poisson (1) distribution, letting X = θ i i 1 X = ∑ and applying the Central Limit Theorem. In fact, for values of θ as small as 25, the normal approximation works quite well.
When θ is not small, assuming X 1 , …, X n is a random sample from a Poisson (θ) distribution is basically equivalent to assuming a random sample of size n from a N(θ,θ) distribution. Estimating θ assuming a random sample of size n from a N(θ,θ) distribution is an interesting problem in itself but is even more attractive with its application in estimating a Poisson mean. In interval estimation of θ assuming normality, a chi-square pivot (Q 1 = (n -1)S 2 /θ), a t pivot ( ) is a reasonable pivot which is approximately N(0,1) for large n when sampling from a Poisson(θ) distribution. Additionally, Garwood (1936) gives an exact interval for the Poisson mean which is valid for any value of n.
As alternatives to the classical intervals indicated above, two Bayesian credible intervals are proposed. The first approach utilizes the Jeffrey's prior assuming a N(θ,θ) distribution and the second approach employs the Jeffrey's prior assuming a Poisson(θ) distribution.
In section 2, the classical confidence intervals are given using pivots Q 1 , Q 2 , Q 3 and Q 4 along with the exact confidence interval first derived by Garwood (1936). The two new Bayesian credible intervals are derived in section 3. Section 4 modifies the previous interval estimators of θ based on a single Poisson(θ) random variable when θ is large. A simulation study is performed to compare the procedures using two different criteria in section 5. The three pointer example is given in section 6 along with a summary of results to conclude the paper. It will be apparent from the numerical studies which of the procedures performs best for various values of θ and n.

Confidence Intervals Using the Pivotal Quantity Method
Let X = (X 1 , …, X n ) be a random sample and let T 1 = t 1 (X) and T 2 = t 2 (X) be two statistics satisfying T 1 ≤ T 2 and P θ [T 1 < θ < T 2 ] = 1-α, where α is between 0 and 1 and does not depend on θ, then the interval (T 1 , T 2 ) is called a 100(1-α)% confidence interval for θ. A pivotal quantity Q is a function of X and θ which has a distribution free of θ. To find a 100(1-α)% confidence interval for θ, find q 1 and q 2 such that P[q 1 < Q < q 2 ] = 1α. The values q 1 and q 2 will depend on α and X. Manipulate {q 1 < Q < q 2 } so that {t 1 (X) < θ < t 2 (X)}, then (T 1 , T 2 ) is a 100(1-α)% confidence interval for θ where T i = t i (X) for i = 1 and 2.
When X 1 ,…, X n is a random sample from a Poisson(θ) distribution, Q 4 = ( ) θ / n X X − is approximately N(0,1) for large n. Using this pivotal quantity, it follows that: is an approximate 100(1 -α)% confidence interval for θ which is the familiar Wald interval.
To find the exact Garwood interval, let W = = α/2. A helpful result that appears in many mathematical statistics textbooks (e.g., Mood et al. (1974)) that relates the Poisson and Gamma families is: where, U has a Gamma(r, θ) distribution. To satisfy (5), where U has a Gamma(w, θ) distribution and V = 2θU has a chi-square distribution with 2w degrees of freedom ( χ 2 (2w)). Hence, θ =  (5), where U has a Gamma(w +1, θ) distribution and V = 2θU has a χ 2 (2(w +1)) distribution. It follows that θ = 2 / 2,2( 1) α w χ + /(2n) and the resulting Garwood 100(1α)% confidence interval for θ is: Although the exact method of Barker (2002) was to find the smallest θ such that α/2 which produce a confidence interval not in closed form, he did not use the result in (5) to derive (6) and did not refer to this interval as the Garwood interval. However, Sahai and Khurshid (1993) briefly gives this argument for n = 1.

Bayesian Credible Intervals
A 100(1-α)% Bayesian credible interval for θ is (t 1 (x), t 2 (x)), where x is an observed value of X = (X 1 ,…, X n ), t 1 (x) ≤ t 2 (x) and P[t 1 (x) ≤ θ ≤ t 2 (x)|x] = 1-α. This probability is with respect to the posterior distribution of θ given x which is given by: where, f(x| θ) is the joint density of X, π(θ) is the prior density of θ and m(x) is the marginal density of X. In Bayesian analysis, all inference is based on the posterior distribution.
Assuming the posterior distribution of θ is approximately normal with mean ˆN µ = E(θ|x) and variance 2 N σ = Var(θ|x), an approximate 100(1-α)% Bayesian credible interval for θ is: Note for the sample that generated Fig. 1, the 95% credible interval for θ given in (9) is (98.25, 107.11) which contains 100. Now assume X 1 ,…, X n is a random sample from a Poisson distribution with mean θ; hence, the joint probability mass function of X 1 ,…, X n is: Employing Jeffreys' method again, first note that ln(f(x i |θ)) = x i ln(θ)-θ-ln(x i !). Hence: this yields the noninformative prior π(θ) = θ -1/2 . Using (10), the resulting posterior density of θ given x is thus given by: Figure 2 plots the posterior in (11) where x 1 ,…, x 50 were randomly generated from a Poisson(100) distribution. This posterior looks fairly normal with posterior mean and variance given by 98.99 and 1.9798, respectively.

Modification of Intervals Assuming a Single Poisson Random Variable
Consider modifying the previous interval estimators of θ in Sections 2 and 3 based on a single Poisson(θ) random variable when θ is large. Letting Y have a Poisson(θ) distribution, the previous confidence intervals and credible intervals are modified using n = 1. First note that the chisquare confidence interval in (1) and the t interval in (2) cannot be altered because the observed value of the standard deviation is 0 in this case due to n being 1. However, the remaining intervals can be modified.
The scores confidence interval based on Q 3 given in (3) becomes: The Wald confidence interval based on Q 4 given in (4) modified for a single Poisson(θ) random variable is given by: Observing Y = y, the resulting Garwood 100(1 -α)% confidence interval for θ is: To derive the Bayesian credible intervals, note from (8) that the posterior of θ given y in the normal case is: Letting ˆN µ and 2 N σ denote the mean and variance of the posterior given in (16), respectively and assuming this posterior is close to normal, an approximate 100(1 -α)% Bayesian credible interval for θ is as given in (9) using the new definitions of ˆN µ and ˆN σ . Similarly for the Poisson case, the posterior density of θ given y is given by: Assuming this posterior is approximately normal with ˆP µ = E(θ|y) and variance 2 P σ = Var(θ|y), an approximate 100(1-α)% Bayesian credible interval for θ is as given in (12) using the new definitions of ˆP µ and ˆP σ . Figure 3 plots the posterior densities given in (16) and (17) (9) and (12) (17) is slightly narrower than the one based on the posterior in (16).

Numerical Studies
To compare the seven approaches, a simulation study is performed considering various values of the parameter θ and the sample size n for the two different underlying distributions from which the random sample is taken. Approaches 1 through 4 utilize the confidence intervals for θ given in (1), (2), (3) and (4), respectively, approaches 5 and 6 utilize the credible intervals for θ given in (9) and (12), respectively and approach 7 is the Garwood confidence interval in (6). Approaches 1 and 2 do not exist for n = 1 because the sample variance equals 0. For n = 1, approaches 3, 4 and 7 employ the confidence intervals given in (13), (14) and (15), respectively and approaches 5 and 6 use the intervals given in (9) and (12) employing the posterior densities given in (16) and (17), respectively.
All combinations with n equal to 5, 10, 20, 50 and 100 and the parameter θ equal to 1, 5, 10, 20, 50 and 100 assuming the N(θ,θ) and Poisson(θ) distributions are considered. Also, values of θ equal to 50,100,150,200,250,300,350 and 400 for n = 1 assuming a Poisson(θ) distribution are investigated. For each n, θ and distribution combination, a random sample is generated and the various intervals are computed, where it is noted whether or not θ is in each of the resulting intervals as well as the length of each of the resulting intervals.  (16) and (17)

(dashed line)
This process is repeated 10,000 times and the percentage of time the respective interval estimates cover the parameter and the average lengths of the respective interval estimates are calculated. Casella and Berger (2002) suggest size and coverage probability as criteria to evaluate confidence intervals. Bolded values in the table indicate the approach that yields the smallest average length among all approaches with coverage percentages of at least 95%. Italic bolded values signify the approach that yields the smallest average length among all approaches with coverage percentages of at least 94.50% which could be rounded to 95%.
Recall that the criteria of coverage percentage and length closely match the criteria of Barker (2002) which also included the criterion of the interval being in closed form. The Wald interval, the scores interval, the Garwood interval and six others for values of nθ between 0.5 and 5.0 were compared using these three criteria. Noting that Barker did not point out that the method that was referred to as the exact method was actually the method of Garwood which yields an interval of closed form, Barker recommends the Garwood interval when desiring the coverage percentage to never go below 95% and the scores interval when tolerating approximate coverage percentage and wide intervals. In the numerical study of Barker, the Garwood interval was always narrower than the scores interval. The opposite is the case in this numerical study; however, the numerical study of Barker considers values of nθ not exceeding 5. Note also that the Wald interval performed poorly in terms of coverage in the Barker numerical study.
Sampling from a normal distribution, Table 1 gives the results for the various θ and n when n > 1. First note that the coverage percentages of all of the approaches are all around 95% for most of the values of θ and n considered. The only exceptions to this are approaches 4 (Wald) and 6 (second Bayes) which have low coverage percentages for small θ and n and approach 7 (Garwood) which has high coverage percentages for small θ and n. It is not surprising that approach 4 does poorly for small n as it is based on Q 4 which is approximately standard normal for large n; however, even for small n, approach 4 has good coverage percentages for larger θ. Note also that approach 4 cannot be used when x is negative. This happened in the simulation study only when θ = 1 and n = 5 or 10; in such cases, the coverage percentages and average lengths were based on the number of loops where x was positive which was over 99% of the time. Similarly, approach 3 (scores) cannot be used when the radicand in (3) is negative. This occurred in only a few loops in the simulations when θ = 1 and n = 5 or 10 and adjustments were accordingly made. The Bayesian approach assuming normal data (approach 5) tended to yield the smallest average length while keeping coverage probabilities of at least 0.95; the only exception to this was when n = 5 for the various θ where approach 5 had the smallest average length but had coverage percentages slightly below 95%. Except for the θ = 1 or n = 5 cases, approach 4 (Wald) performed well followed by approaches 6 (second Bayes) and 3 (scores). When n = 5, approach 3 performed best for smaller values of θ. For any θ, approaches 3, 4 and 6 have very close average lengths for n = 50 and 100; however, approach 5 has smallest average lengths in these cases. Approach 7 (Garwood) outperformed the others in only one case (θ = 50, n = 5) where several others had smaller average lengths but had coverage percentages a little below 95%. This is not surprising as this interval is not based on normal data. Approach 1 is the worst as its average length far exceeds the other intervals while having reasonable coverage percentages.     Table 2 considers the same values of θ and n as Table  1 but the sample is taken from a Poisson distribution. First note that for small θ and n, it is possible for x 1 = … = x n which yields (0, 0) using approach 1, ( x , x ) for approach 2 and (0, 0) for approach 4 only when each x i = 0. For these cases, this occurred less than 1% of the time and the coverage percentages and average lengths were appropriately modified counting only those cases where this phenomena did not occur. For smaller θ, approaches 1 (for θ = 1 and 5) and 5 (for θ = 1, 5 and 10) tended to have coverage probabilities below 0.95 but these probabilities get closer to 0.95 as θ increases. Approach 4 performed the best for these values of θ followed closely by approaches 6 and 3 except when θ = 1 and n = 5 or 10 where approach 6 performed the best followed by approaches 3 and 7 with approach 4 having too small covering percentages. When θ is larger (θ = 20, 50, 100), the coverage percentages of all the approaches are all around 95% for the various n. In this case, approach 5 performs the best followed by approaches 4, 6 and 3; in fact, for n = 20, 50 and 100, approaches 3, 4 and 6 have almost identical average lengths. Again approach 1 has much higher average lengths than the other approaches for all θ and n considered. Table 3 gives the results for various large values of θ when sampling from the Poisson distribution when n = 1, where approaches 3 through 7 are considered. The coverage percentages of all five approaches are around 95% for all the values of θ given. The Wald approach (approach 4) is the dominating approach followed by the Bayesian approach assuming sampling from a Poisson population (approach 6). Approach 5 is the next best approach and approaches 3 and 5 have identical average lengths for large θ. Approach 7 (Garwood) had good coverage probability but had largest average length among the five approaches. The good performance of approach 4 may be surprising as it is reasonable to suspect that it would not do well for small n; however, even in Table 1 and 2, approach 4 did perform well for larger θ when n is small.
For Table 1 and 2, average lengths increase as θ increases for fixed n for each approach and average lengths decrease as n increases for fixed θ for each approach. For Table 3, average lengths increase as θ increases for each approach.
Consider comparing the results of (n 1 = 1, θ 1 = θ) and (n 2 = n, θ 2 = θ/n) cases in Table 2 and 3 where Poisson data was assumed. For example, compare the results of the (1,250) and (50,5) cases or the results of the (1,400) and (20,20) cases. For approaches 3, 4, 6 and 7, the average length of the intervals for the (1, θ) case was about n times that for the respective (n, θ/n) case. This can be seen by comparing (3) to (13), (4) to (14), (11) to (17) and (6) to (15) and noting that X 1 ,…, X n is a random sample from a Poisson(θ/n) distribution and Y is a Poisson(θ) random variable. When you multiply the intervals in (3), (4) and (6) by n, intervals (13), (14) and (15), respectively, are obtained. Noting that the posteriors in (11) and (17) are the same when (n, θ/n) is substituted in (11) and (1, θ) is substituted in (17) and when using (11), the interval for θ 2 = θ/n would actually be found. Unfortunately, the cases considered in Table 2 and 3 cannot be reduced due to the observation noted above only applies to approaches 3, 4, 6 and 7.       It is difficult to compare results of this simulation study with the conclusions of Schwertman and Martinez (1994), Barker (2002), Byrne and Kabaila (2005) and Patil and Kulkarni (2012) due to the various comparison criteria and values of n and θ considered. However, it is fair to say that some considered the Garwood interval more favorable and the Wald interval much less favorable than this study. For instance, for n = 1, Patil and Kulkarni (2012) determined from their numerical study that the Garwood and scores intervals had good coverage rates for θ in the interval (0,50] with the Garwood interval having shortest length along with three other intervals among the 19 intervals for θ between 4 and 50. Furthermore, they recommended to avoid the Wald interval due to low coverage rates for all θ they considered. Tanusit (2012) considered θ = 1, 2,…, 5 and n = 10, 11,…, 100 and recommended the scores and Wald intervals for small n and large n, respectively.
The adaptation of the Wald interval given in Khamkong (2012) is referred to as AWC and is given by The simulation study considered θ = 1, 1.5, 3, 5 and 10 and n = 15, 25, 50 and 100; hence, there is some overlap with the numerical study of this paper. Note that the length of AWC equals the length of the Wald interval given in (4) and the length of AWC is less than the length of the scores interval given in (3). From the simulation of Khamkong, the coverage rates of the scores interval and AWC are usually equal and below 95%. Requiring a coverage rate of at least 95%, the table below using Poisson data notes the approach that yielded the shortest length when AWC is compared to those in this paper for θ and n that include or are somewhat close to values in the simulation of Khamkong. Referring to the AWC interval as approach 8, the Wald interval and the second Bayes approach dominate for these values of θ and n. Assuming the N(θ,θ) model, note that T = 1 n 2 i i X = ∑ is a minimal sufficient statistic for θ because the joint density of X 1 , …, X n in (7) can be written as: By the Sufficiency Principle, any inference about θ should depend on the sample (X 1 ,…, X n ) only through T. Note that Q 1 and Q 2 involve X as well as T and Q 3 does not involve T. Assuming the Poisson(θ) model, the Wald interval based on Q 4 , the Bayesian approach yielding (12) and the exact Garwood interval have 1 n i i X = ∑ as a minimal sufficient statistic for θ. The only approach that solely utilizes T is the Bayesian approach that assumes the N(θ,θ) model. Hence, it makes sense that approach 5 is the dominating approach for normal data as in Table 1.

Example
After the National Basketball Association (NBA) and American Basketball Association (ABA) merged in 1976, the San Antonio Spurs have won five NBA championships and 22 division titles. For the 1979-1980 season, the NBA adopted the three-point shot, where a player's feet must be completely behind the three-point line at the time of the shot or jump in order to make a three-point attempt. In recent years, the Spurs have typically been a very good three-point shooting team.  20, 24, 21, 18, 27, 30, 22, 20, 17, 15, 28, 20, 18, 20, 17, 16, 24, 22 and 17, respectively  From the table, note that the actual value of θ, 21.4, is in each interval. Approach 1 yielded by far the widest interval while approach 5 (first Bayes) yielded the shortest interval followed closely by approaches 4 (Wald), 6 (second Bayes) and 3 (scores). This is what is expected considering our discussion of Table 2 for θ = 20 = n.
Of interest may also be the number of three-pointers made for these 20 games in the sample; these values were 12, 10, 7, 8, 7, 16, 10, 4, 9, 6, 10, 13, 9, 4, 8, 5, 10, 13, 9 and 4, where the actual mean number of threepointers made in a game was θ = 8.3. This sample resulted in a mean and variance of 8.700 and 10.642, respectively. From the table below, note that 8.3 lies within each interval. Approach 1 is again the worst approach, whereas, approach 5 is the best followed closely by approaches 4, 6 and 3. From our discussion of Table 2 when θ is small and n = 20, approach 4 was expected to perform the best.

Conclusion
When sampling from a N(θ,θ) or a Poisson(θ) distribution, the Bayesian credible interval obtained assuming normal data (approach 5) appears to perform very well for various values of θ and n. The only exceptions to this occurs when sampling from a N(θ,θ) distribution and θ and n are small where approach 3 (scores) is preferred and when sampling from a Poisson(θ) distribution when θ is small or when θ is not small and n = 1 where approach 4 (Wald) is preferred.
In statistics, it is always desired to improve on statistical methods in various inference problems. Estimating a Poisson mean is not a new problem and has been discussed in the literature for years. Although the distribution was named after Simeon Denis Poisson in the 1800's, some believe it should have been named after Abraham de Moivre who actually appears to be the first to discover it in the early 1700's.