Whether Gaussian Nucleus Entropy Helps? Case in Point is Prediction of Number of Cesarean Births

Email: shanmugam@txstate.edu Abstract: In this article, entropy in the collected data about the Gaussian population mean is traced from its embryonic stage as new data are periodically collected. The traditional Shannon’s entropy has shortcomings from the data analytics point of view and it creates a necessity to refine the Shannon’s entropy. Its refined version is named Gaussian Nucleus Entropy in this article. Advantages of the refined version are pointed out. The Prior, likelihood, Posterior and predictive nucleus entropies are derived, interconnected and interpreted. The results are illustrated using data on cesarean births in thirteen countries in the period [1987, 2007]. The medical communities and families are alarmed, as the cesarean births are increasing not due to emergency or necessity basis but rather for monetary or convenience basis. Nucleus entropy based data analysis answers whether their alarm is baseless.


Introduction
What is entropy? In essence, entropy is all about information. Shannon (1948) introduced a seminal idea of capturing information in the collected data about an unknown parameter. In his example dealing with electronic communication, Shannon evaluated the amount of captured message in a destination location compared to the amount transmitted out of an originate location. Upon the advice of his friend John von Neumann, Shannon misnamed it as entropy for a reason. What was the reason? Claude Shannon himself revealed it: "My greatest concern was what to call it. I thought of calling it information, but the word was overly used, as I decided to call it uncertainty. When I discussed it with John von Neumann, he had a better idea. Von told me; you should call it entropy for two reasons. In the first place, your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, more important, nobody knows what entropy really is, so in a debate you will always have the advantage" (Tribus and McIrvine, 1971). However, the Shannon's entropy concept is popularly utilized in medicine, health, engineering, business, economics, marketing and statistics among other disciplines with a contextual interpretation opposite to what really, Shannon intended (Smelser and Baltes, 2001).
A litmus test for an entropy is that it should be quantifiable, partially orderable, additive, storable and transmittable. Shannon's entropy possesses several of those in the above list but does not the much-needed additive property, which is a requirement in data analysis. When an additional observation becomes available, the entropy ought to increase. The Shannon's entropy does not do so and hence, it needs a refinement. These and other controversies about the Shannon entropy convinced professionals to give up on entropy as documented in Ben-Naim (2011). Instead, why not we refine entropy and use it? Exactly, this question is answered in this article.
The Shannon's entropy is refined, in this article, by peeling away unnecessary entropy junkies. This refined new version is named nucleus entropy. Gaussian probability distribution is focused because it is more popularly employed in data analysis than any other distribution. Advantages in the Gaussian nucleus entropy over the traditional Shannon's entropy are pointed out.
Biostatistics community often selects Bayes approach over the frequentist's approach to perform data analysis and prediction as new data are periodically entering the databases. In the Bayes approach, the conjugate prior jointly plays a cohesive crucial role along with that data dominant likelihood function to shape up the posterior frequency density curve. For sake of making a prediction of yet to be calculable, the predictive frequency density curve is often employed in the Bayesian analysis. In the course of time, a posterior frequency curve of the past becomes the prior for making the posterior frequency curve of the current time as new data become available. In this framework and practice, the Gaussian conjugate prior, data dominant likelihood, posterior and predictive nucleus entropies are derived, interconnected and interpreted.
In the illustration, the log-transformation of the number of cesarean births (in 1,000) over the years since 1987 in six groups of thirteen countries: Belgium, Canada, Czech, Denmark, Finland, Ireland, Italy, Norway, Portugal, Slovak, Spain, Sweden and US in Declercq et al. (2011) is considered, analyzed and interpreted. The future number of cesarean births in each of these thirteen countries is predicted and compared. Advantages of Gaussian nucleus entropy are articulated. A few conclusive comments are made in the end.

Gaussian Nucleus Entropies and Their Properties
To be specific, consider that a random sample y 1 ,y 2 ,...,y n is drawn from a Gaussian population Notice that its natural parameter µ is not even a part of the entropy (2). Furthermore, the Shannon's entropy exhibits a weakness from the practitioners' point of view. To see it, first note that it is known (Stuart and Ord, 1994) that a sum s = y 1 + y 2 +…+ y n of n independent and identically Gaussian distributed observations follows a Gaussian probability structure 2 ( , ) y f s n n µ σ . In which case, the Shannon entropy of the sum s = y 1 + y 2 +…+ y n ought to be n times 2 ( , ) With an observation -∞<y<∞, an unknown (natural) parameter -∞<µ<∞ and an entropy accumulator y B y σ ) are entropy junkies. Unlike the Shannon's entropy, notice that the nucleus entropy involves both the unknown natural parameter µ and an entropy accumulator parameter 2 y σ . The Gaussian nucleus entropy is more appropriate, appealing and meaningful than the Shannon's entropy.
From now on, the variance 2 y σ could perhaps be recognized and called as nucleus entropy accumulator. In other words, from (2b), the Gaussian nucleus entropy is for a sum s = y 1 + y 2 +…+ y n of n independent and identically distributed Gaussian observations.
Without losing any generality, the nucleus entropy could be expressed in a logarithmic scale just for the sake of a comparison with the Shannon's entropy, which is in a logarithmic scale. The expected nucleus entropy of a single observation y is 2 2 , , : : The sample counterpart of (3a) is named calculable Gaussian nucleus entropy and it is Equation 3b: The expected and observed nucleus entropy of a sum s = y 1 + y 2 +…+ y n of n random sample observations are Notice that the observed likelihood nucleus entropy (4b) is calculable. The ratio of conditional expectation and variance: Is explainable risk in the likelihood to address the unknown Gaussian mean parameter µ (O'Hagan, 1994). Notice that the likelihood-based risk is more in the high neighborhood of the mean.
In the Bayesian analysis, the prior distribution for the unknown (natural) parameter of interest needs to be chosen in such a manner that the current data likelihood function is a dominant factor in the process of updating to obtain the posterior distribution. A uniform or a conjugate prior distribution is conventionally considered. The conjugate prior is more versatile than the uniform prior. In addition, in terms of the probability structure, the conjugate prior distribution is compatible with the likelihood function (Shanmugam, 1992). For the unknown (natural) Gaussian mean parameter, a Gaussian distribution Equation 5: Is known (O'Hagan, 1994) to be conjugate prior distribution based on a prior sample of size n 0 ≥1 and their mean 0 y . On its own merit, the conjugate prior distribution (5) is Gaussian and hence, the expected and observed prior Gaussian nucleus entropies are respectively: A comment is necessary here and that the expected prior Gaussian nucleus entropy (6a) is calculable. Interestingly, there is a parallelism among expressions (4a), (4b), (6a) and (6b) and it is due to conjugation principle. The ratio of prior conditional expectation and variance: Is calculable prior risk to address the unknown Gaussian mean parameter µ (O' Hagan, 1994). When the Gaussian nucleus entropy accumulation is zero in the beginning (that is, 2 0 y σ = ), the calculable prior risk is maximum possible one. Notice that the calculable prior risk reduces when the Gaussian nucleus entropy accumulation 2 y σ increases. The importance of Gaussian nucleus entropy could not be clearer.
A beauty of the Bayesian analysis is that even if the prior distribution happens to be a bad selection, it is eventually moderated in the posterior distribution. What is the posterior distribution? It is an update of the prior by its mixing with the data dominated likelihood function and integrating out their commonalities. Mathematically, it is done as follows: 0; , .
The posterior distribution (7) is also Gaussian and ence, the expected and observed posterior Gaussian nucleus entropies are respectively: Is calculable posterior risk to address the unknown Gaussian mean parameter µ (O'Hagan, 1994). When the Gaussian nucleus entropy accumulation is zero in the beginning (that is, 2 0 y σ = ), the calculable posterior risk is maximum possible one. Notice that the calculable posterior risk reduces when the Gaussian nucleus entropy accumulation The purpose of knowledge discovery is only to reap its benefits currently or in the future. The Bayesian concepts meets this practical purpose. In other words, based on the prior or posterior distribution, the Bayesian approach leads into finding and using the so called predictive distribution to forecast what might be the future calculable mean, y ɶ based on a yet to be collected random sample of size m≥1? The prior predictive density (O'Hagan, 1994) is Equation 9: Which is also Gaussian. Hence, the expected and observed prior predictive Gaussian nucleus entropies are respectively: Again, the expected posterior Gaussian nucleus entropy (10a) is calculable.
Due to the posterior distribution (7), the posterior predictive density is Equation 11: Which is also Gaussian. Hence, the expected and observed posterior predictive Gaussian nucleus entropies are respectively Equation 12a Which suggests that the extra quadratic information increases when the Gaussian nucleus entropy accumulation increases and/or the current sample size far exceeds the prior sample size.
The calculable jump in the predictive Gaussian nucleus entropy from prior to posterior time is Equation 14: The calculable jump from the prior Gaussian nucleus entropy to the posterior nucleus entropy is Equation 15: where,

Theorem 1
Because the first three factors in (16) are less than one, when: The Gaussian nucleus entropy in the current sample is lesser than its counterpart in the prior sample and hence, the current sample is lesser informative than the prior sample.
When the Theorem 1 prevails, further sampling might not be worthwhile.

Case in Point is Gaussian Nucleus Entropy to Predict Number of Cesarean Births
In this section, the results of the previous section are illustrated using the data on the number Y of cesarean births (in 1,000 s) incountries: Belgium, Canada, Czech, Denmark, Finland, Ireland, Italy, Norway, Portugal, Slovak, Spain, Sweden and US during 1987through 2002in Declercq et al. (2011. The first and foremost data analysis checks whether the lnY (because the numbers are big in size) follows a Gaussian distribution. Figure 1 to 5 and they confirm that the lnY (because the dots are closer to the upward diagonal line) indeed follows a Gaussian distribution. Hence, the data are suitable for the illustration of the Gaussian nucleus entropy results of the previous section. Figure  6 to realize the proximity and trend among the years of the incidences.
In our analysis, we evaluate the data before the year 2,000 as prior sample (with n 0 = 3) and the data after the year 2,000 as current sample (with n = 2). Table 1 for the values of lnY,    The Fig. 7 confirms that the expected posterior Gaussian nucleus entropy dominates the corresponding expected prior Gaussian nucleus entropy in every country. This finding would have been missed if the Shannon's entropy is used and it emphasizes the importance of the nucleus entropy of this article. First, the Fig. 8 informs that the average incidence of cesarean births during the years 2,000-2,007 has been consistently more than the average incidence of cesarean births during the years 1,987-2,000 in every country. It, therefore, answers that the concern of the medical professionals and the communities of increasing cesarean births is legitimate. The extra quadratic information: inf ormation ℚ in the current sample compared to what it was in the prior sample is consistently more than the percent reduction in the calculable risk calculable risk − ℜ in each country (Fig. 9) and it confirms that the data information is more than the risk of addressing the Gaussian population mean, according to the nucleus entropy. Such confirmation is not possible when the Shannon's entropy is involved and it demonstrates the importance of nucleus entropy as a refinement of the Shannon's entropy. Finally, we notice ( Fig. 10) that the countries cluster together with respect to all of the above mentioned nucleus entropy related results. In specific, the countries: US and Canada form the first cluster, the countries: Norway, Sweden and Denmark form the second cluster and the countries: Czech, Finland, Ireland, Italy, Spain, Belgium and Portugal form the third cluster. The visualization of such clusters of countries is feasible due to the concept of nucleus entropy and it is not possible in using the complicated Shannon's entropy.

Conclusion
In conclusion, the central theme, nucleus entropy of this particle refines the seminal and wonderful Shannon's concept to capture data information. Though this refined nucleus entropy has been illustrated using the cesarean data, it is versatile enough to be useful in finance, economics, marketing, engineering, medical and health studies also. Further research on building regression models using nucleus entropy concept is underway and it will be communicated later in a journal article.