Is Air Pollution Controlled Enough? “Tapered Probabilities” Answer

Email: rs25@txstate.edu Abstract: Air pollution is dangerous sespecially to the residents in cities, as it causes lung, heart, or respiratory illness. Public health professionals ponder whether the air pollution is significantly controilled. The amount of existing air pollution is measurable, but not its regulated level, which is therefore as a parameter in a novel manner in this article. However, there is no suitable methodology in the literature to extract and test whether the an estimated data evidence about the regulated air pollution level is significant. To fulfill the need, this article constructs a methodology based on tapered probability model for the air pollution data. Our new methodology is demonstrated using the air pollution data in African, American, Asian, European and Oceanic continents.


Motivation
Air pollution is too serious menace to the residents in cities worldwide, when they are vulnerable to respiratory related illness (see Dominici et al., 2000 for details). More often than not, air pollution is life threatening to patients with asthma or respiratory complications (Louwies et al., 2013). Air pollution data are meticulously collected but are not well utilized to learn their intricate patterns. A reason is detected and it is the lack of suitable analytic methodology. After a thorough literature search, we notice that no satisfactory underlying probability model for the air pollution data seems to have been identified in the literature. We propose and construct a versatile model called Tapered Probability Function (TPF) to address the significance of the regulated level of the air pollution in this article.
The TPF is not popular among the healthcare professionals but it is popular among the geologists who deal with the earthquakes. See Kagan and Schoenberg (2001) for details on the earthquake application of the TPF. Estimation and hypothesis testing of the parameters in TPF are still technical challenges. To ease the mathematical and computational difficulties of the TPF, an innovative approach is devised, presented and explained in this article. New expressions for the methodology are derived, displayed and interpreted in the context of discussion about air pollutions. This new approach is utilized to assess the significance of an estimated regulated level of the air pollution in major cities across the African, Asian, American, European and Oceanic continents. The contents of this article are made easier for the applied healthcare in their practice to test the significance of an estimated regulated level of the air pollution.
A threat to live healthy in this advanced electronic age of 21st century is the air pollution related hazards. See Mateen and Brook (2011) for details about the health impacts of air pollution. What is air pollution? Air pollution refers the existence of harmful visible or invisible substances including particulates and molecules in the earth's atmosphere. Air pollution has adverse effects on human health. The World Health Organization warned that the air pollution in the year 2014alone caused the premature death of 7 million people worldwide with the highest death rate in India compared to other nations. See Mateen and Brook (2011) for details.
The developing nations aspire to improve their standard of life. Their industrial or agricultural activities are the source of air pollutions. The developed nations tend to maintain their higher standard of living by continuing on their automobile emissions caused air pollutions. For a different reason, the developing and developed nations do pollute air. What do the air pollution data suggest about who among the developed and developing nations pollute more? Shanmugam and Hertelendy (2011) answered this question. Stronger policies to reduce the emission of air pollutants globally would help.
More and more industrial wastes are dumped to air, land and water sources and they cause tremendous illnesses. In this process, a significant amount of oxygen in the natural environment is depleted with an increase of adverse toxics in the air. Mainly recognized three culprits in the air pollution are PM 10 , SO 2 and NO 2 .
How does PM 10 affect our health? When the PM 10 is inhaled, the particles evade the respiratory system's natural defense and lodge deep deposit in the lungs (Pope et al., 2002). Health problems begin to arise as the body reacts to these foreign particles. The "sensitive populations" to PM 10 are children, elders, exercising adults and those suffering from asthma or bronchitis. The presence of PM 10 increases the severity of asthma attacks, cause bronchitis, inflict lung diseases and reduce the body's ability to fight infections (see Weinmayr et al., 2010 for details). See Figure 1 through Figure 3 for the trend of PM 10 in global major cities in Africa, America, Asia, Europe and Oceanic continents.
The SO 2 denotes sulfur dioxide and it is a toxic chemical gas with a pungent, irritating smell. More often, it is by volcanos. In ancient times, Romans injected SO 2 into wine bottles to keep wine fresh and free from bad smell. Inhaling SO 2 causes respiratory difficulties and related lung diseases.
The NO 2 is a reddish-brown chemical gas with odor and is often a by-product of human activities, volcanos and lightning (see Figure 10, Figure 11 and Figure 12). The NO 2 absorbs sunlight and ozone's thickness. Abundant NO 2 in air triggers skin cancer, mild irritation in nose and throat, bronchitis, pneumonia and lung diseases.
It appears from the data (see Table 2 and Figure 13 for details) that the level of PM 10 is more when the level of SO 2 is more and vice versa. Hence, this article selects and focuses just the pollutant PM 10 for data analytics and continental comparisons (see Figure 1 through Figure 5 for the trend in African, American, Asian, Europe and Oceanic continents).
Data are collected in general worldwide to understand the patterns, sources and practical ways to reduce air pollutions if not their total elimination. However, the efforts are not satisfactory enough always. Why is it so? It is perhaps due to a lack of suitable data analytic methodology, which requires finding a suitable underlying model for the collected data. What is model? The model is an abstraction of the reality. The model has to be simple and yet, powerful to capture the essence in reality. One such powerful model, which could help to capture and assess the regulated level of the air pollution is TPF.
Returning to the discussion of air pollutants, what is particulate matter, PM 10 ? It refers to fine suspended (less than 10 microns in diameter which is 1/7th thickness of the human air) capable of penetrating deep into the respiratory tract and causing health damage.   Continent  Average Population  -------------------------------------------------------------------------------------------( The other two troubling air pollutants to a healthy living are sulfurdioxide (SO 2 ) and nitrogendioxide (NO 2 ). What is SO 2 ? It is an airpollutant produced when fossil fuels containing sulfur are burned. It contributes to the acid rain and it can damage human health, particularly that of the young and the elders (see Counter and Buchanan, 2004 for details). The SO 2 is a toxic gas with a pungent, irritating smell, released naturally by the volcanic activity. In the ancient days, the SO 2 was entered by burning sulfur candles inside empty wine vessels to keep the wine fresh and free from vinegar smell. The SO 2 is a noticeable in the regular atmosphere. The SO 2 is a precursor to the acid rain and all health problems.
What is NO 2 ? The NO 2 is a chemical compound due to industrial synthesis of the nitric acid. At a higher temperature, the NO 2 appears as a reddish-brown gas with a biting odor. It is a paramagnetic, bent molecule. It is introduced into the environment by several natural causes. It is used in the space vehicles such as the Titan rockets by the NASA and space agencies. The NO 2 is a poisonous, pungent gas formed when nitric oxide combines with hydrocarbons and sunlight. See Weinmayr et al. (2010) for details.
What are the tolerance levels of the air pollutions? The World Health Organization (WHO) recommends no more than20 micrograms per cubic meter for PM 10 , no more than 40micrograms forNO 2 and no more than20 micrograms per cubic meter for SO 2 (http://apps.who.int). In accordance with the US Clean Air Act, the Environmental Protection Agency (EPA) often reviews the national air quality standards for PM 10 , SO 2 and NO 2 . The remedies to control air pollutions are dust control in roads, building constructions, landfills, landscaping, barrier, fencing to reduce windblown dust, programs to reduce emission from wood stoves and fireplaces, cleaner-burning gasoline and diesel fuels, emission control devices for motor vehicles, controls for industrial facilities etc.
Extracting pertinent evidence from the collected air pollution data and learning the realities using appropriate analytics to formulate healthy environmental policies or to execute regulating procedures are too technical stumbles for the practicing environmentalists on daily basis. To alleviate such stumbles, this article is prepared, articulated and demonstrated with real data on air pollution. One unique environmental policy for all worldwide cities is a feasible target.
This article first introduces formally in the next section a versatile and powerful model called Tapered Probability Function (TPF) with its statistical properties. The aims of this article are to construct a data analytic methodology to use for a statistical comparison of the cities across three ambitiously industrializing continents (namely African, Asian, American, European and Oceanic) with respect to their air pollution levels. Neat new expressions are derived in this article to make a hypothesis testing of an estimated regulated level of the air pollution. The p-values and statistical powers for the three continents are calculated and compared (see Abramowitz and Stegun, 1972 for details). By emulating our new methodology of this article, practicing environmentalists could analyze the air pollution data of their own, rank and classify the places in different groups.

How Tapered Frequency Distribution Canassess Air Pollution Level?
To begin with, let Y be a random variable indicating the air pollution level in a city at a specified time. With the parameters θ>0,τ≥0 and φ≥0 denoting respectively the severity level, a threshold level and an regulated level of the air pollution pattern, let the probability density function: with a survival function: (2) be the underlying model for the data on Y. The model (1) is called Tapered Probability Function (TPF). A reason for selecting the TPF (1) is for its versatility to portray the regulated level of the air pollution. The TPF is quite familiar to the geologists who unravel the mysteries behind the earthquakes. The TPF is not that much familiar to the environmentalists or healthcare professionals who deal with air pollution or its impact on health. Some heuristic introductions of the TPF are worthwhile.
First, is f (y|τ, θ, φ) in (1) a bona fide probability function? To be a bona fide, a function should be nonnegative and the area under the functions should add up to one. For the specified parametric space , the function ( , , ) τ θ φ f y is non-negative. In addition, the area under the non-negative function is one because What does the TVaR (3) refer? Given the air pollutionexceeds a tolerance mark y p (recall that asthma and other respiratory patients in cities like Beijing, Denver, Delhi, Albuquerque among other global cities are advised not to leave home when the pollution level is too dangerous), the average pollution level (see Table 1) on the day is what the expression ( (3) indicates. Notice that it increases as the regulated parameter φ≥0 increases. Of course, the severity parameter θ>0 also plays a role. Furthermore, in such a day for asthma and other respiratory patients not to leave their home, will the pollution not worsen by staying within an extra level "s"? This is called hope probability and it is: See Figure 6 for the dynamics of the hope probability in z-axis, θ in x-axis and φ in y-axis with m = 10 and s=1.
Two things need to be noticed in hope probability (4). They are: (1) when the regulated level, φ diminishes to zero due to complete control or as the severity parameter, θ diminishes, the hope probability is lesser and (2) the hope probability is lesser when the added amount "s" is larger.
However, if the air pollution exceeds the tolerance mark "t", will it remain within the safety mark "m"? This is called safety probability and it is: The safety probability (5) vanishes, when the tolerance and safety marks are closer. When the regulated parameter φ approaches zero (due to strict environmental regulations), thesafety probability (5) converges to only a finite non-zero amount not zero unless the severity rate θ is extremely large. For a larger pollution severity (that is, θ→∞), the safety probability (5) converges to a larger amount 1 ( ) φ − t m depending only on the regulated parameter φ. Such observations warn us the importance of dealing simultaneously both the severityand the regulated parameters in discussions of air pollution. How much memory of the past air pollution pattern is kept by the nature? That is, if the air pollution has crossed a tolerance mark m, how probable for the pattern to continue so the air pollution will exceed an additional allowance "S"? An answer to this question resides in the memory function (6). That is:  (6), realize that the nature's decreasing memory is expedited by two different co-factors. The first factor accommodates the influence by the regulated parameter φ. The second factor accommodates the influence by the severity parameter θ . See Figure 7 for how the dynamics of the nature's memory changes as both the severity and regulated parameters shift. We would have missed the above observations without using TPF (1). So far, we have witnessed several advantages of using the TPF (1) to capture and interpret all about air pollutions. The mathematical difficulties of the TPF do continue as we proceed to construct a data analytic methodology. To begin realizing it, note that the mean ( , , ) τ θ φ E y and variance, ( , , ) τ θ φ Var y of the TPF (1) are: , , ( , , ) , where, φθ = τ and hence, the pollution level shifts from the threshold, τ and the shift is: And: , , , , The chi-squared distribution function, 2 Pr( ) χ > mdf a has been widely tabulated and it is made available. Usually the Moment Estimators (ME) are easier, in general, compared to the Maximum Likelihood Estimators (MLE). However, for the TPF (1), even the MLEs are difficult as they are seen in (7) and (8) for the likelihood function to be non-zero. Then, the loglikelihood function is: (1) 2 2 1 1 ln ( , , ) {ln The expression (12.b) may be heuristically interpreted in a sense as follows. Recall that the parameters θ and φ respectively portray the severity and regulated level of the air pollution. In practice, when the air pollution's severity is more, the regulated level must have been lesser and vice versa.
The expression (12) asserts that for a specified threshold τ , the efforts are to reduce the estimated be obtained. Note that: And: because of (12). To find the initial value of the φ MLE , we consider a limiting scenario ln ( , , ) τ θ φ → ∞ L of the likelihood function. After algebraic simplifications, the MLEs are then obtained and they are: And:

Fig. 8. Nonlinear balancing relation τ in z-axis, φ in x-axis and θ in y-axis
where, I 2×2 is the expected Fisher's information matrix, 2 ∂ ij is the second derivative with respect to i and j . That is: The E(Y) is derived in (7). The determinant of the Fisher's information matrix is: What is hazard function for the TPF? The hazard function is the ratio of the probability function over the survival function. In the survival analysis, the hazard function is recognized as the failure rate function. The hazard function for the TPF (1) is much simpler expression: See Figure 7 for the graphics of (19). The hazard function (19) is impacted by both the air pollution's severity and regulated level. See Patel and Schoenberg (2011) for details about the graphical approaches. In a scenario of higher air pollution's severity (that is, θ→∞ ), the air pollution's regulated level has a full advantage marginally on the hazard (that is, lim ( , , ) In a scenario of completely controlled and reduced air pollution by the regulated (that is, φ→0), the air pollution's severity has a full advantage marginally on the hazard function (that is, To advise asthma and other respiratory patients, the healthcare professionals wonder, at times, whether an estimated air pollution's regulated level, φ is statistically significant. This amounts to testing the null hypothesis  ( Hence, the variance is Incidently, the correlation between the MLEs of the air pollution's severityrate and the regulated level is: Interestingly, the correlation (22.b) points out that when the air pollution's regulated level (that is, φ) is low, the air pollution's severity (that is, θ) is higher and it makes sense. Also, we notice from (22.b) that such a correlation is inversely proportional to the detected variability, Var of the estimated regulated levels,φ . In other words, the more variability among the estimated regulated levels, φ lessens the correlation between the estimated air pollution's regulated and the severityrate. We where, the right side is the critical value based on the 100(1-α) th percentile of the central chi squared df and a significance effect α∈(0,1). We now write the p-value (23) for rejecting the null hypothesis in favor of an alternative hypothesis and it is: The expression (24) follows a non-central chisquared probability distribution with one df and noncentrality parameter: We now turn to discuss two extreme scenarios of the above mentioned data analytics. The first scenario is one in which the air pollution severity is highest (that is, θ→∞). The second scenario is one in which the air pollution's regulated level is smallest (that is, φ→0).

Scenario 1 (Pareto Distribution)
When the air pollution severity rate is highest (that is, θ→∞), the TPF (1) becomes Pareto PDF: as a special case with the survival function . The mean and variance are: What is Pareto distribution? Pareto (1897) introduced a power-law (later it was called Pareto distribution in his name) to describe the differential allocation of wealth in society. It provides a better-fitting alternative when other (such as the lognormal, half-normal, exponential, Frechet) distributions fails to fit less heavy tail frequency pattern. The hope probability (4) becomes However, the safety probability (5) becomes Pr( How much memory (6) of the past air pollution pattern is kept by the nature? That is: The hazard function in (19)

Scenario 2 (Guaranteed Exponential Distribution):
When the air pollution's regulated level is negligible (that is, φ→0), the TPF (1) approaches guaranteed exponential PDF: How much memory of the past air pollution pattern is kept by the nature in this scenario? The memory function The mean, ( , , ) τ θ φ E y in (7) and variance, ( , , ) τ θ φ The MLE of the air pollution's severity is

Pollution in Africa, American, Asian, European and Oceania
In this section, the data analytic methodology of the Section 2 is illustrated using the air pollution data of cities in American, Asian and European continents. Their minimum (Min) average (Ave) numbers are displayed in Table 1. The Min and Ave of PM 10 as well as SO 2 are more in Asia and it is a clue that the regulated is low in Asia. The American continent has higher average population in cities. A comparison of Fig. 8-10 reveals that in America and Europe, the NO 2 is more than the PM 10 and SO 2 . The variations in the levels of NO 2 , PM 10 and SO 2 are about the same in Europe. In Asia, the levels of SO 2 has a higher variation. In America, the levels of NO 2 has a higher variation. It is interesting that the Fig. 11 reveals that Africa is quite an outlier compared to the continents America, Asia, Europe and Oceania (including Australia and New Zealand) with respect to the pollution PM 10 .
Note that higher level of SO 2 means a higher level of PM 10 (see the correlations and their p-values in Table 2). The variations in the multivariate data are analyzed using the principal components. The first two principal components explained 66.7% of the total variations and the proximity among the variables NO 2 , PM 10 and SO 2 . The NO 2 is more closely connected to the urban population. More SO 2 is associated with a higher PM 10 (see Fig. 9). Hence, this article focuses on the analysis and interpretations of only the PM 10 .
The minimum level y (1) of PM 10 is lowest in American continent and keeps rising in other continents in the order of Oceania, Europe, Asia and Africa (see Table 3). The order of the estimated severity, θ is lowest in Oceania and keep rising in the other continents America, Europe, Asia and Africa in that order (see Table 3). The estimated regulated level φ of the air pollution is lowest in Asia but is better in other continents Europe, America, Africa and Oceania in that order (see Table 3). The expected shift ˆ, , τ θ φ Shift of the air pollution is lowest in Africa but it increases in other continents America, Oceania, Europe and Asia in that order (see Table 3). The tail value at risk (TVaR) of the air pollution function is and it is the expected average in the tail of the frequency trend of the air pollution. The TVaR is lowest in America and it increases in the other continents Europe, Asia, Oceania and Africa in that order (see Table 3). The hope probability portrays the chance for the air pollution to remain within a tolerance level if it exceeded a warning level. The hope probability is best in Asia but declines in other continents Africa, Europe, America and Oceania in that order (see Table 3). If we think of nature as a chance oriented system for emitting air pollution, the system's memory is lowest in Oceania but increases in other continents Asia, Africa, America and Europe in that order (see Table 3).   Remember that the estimated severity of the air pollution is inversely proportional to the estimated regulated level of the air pollution. This inverse relation is exhibited in their negative correlation. The estimated negative correlation is lowest in Oceania but increases in other continents Africa, America, Europe and Asia in that order (see Table 3). The null hypothesis 0 : 0 φ = H portrays the insignificant regulated level of the air pollution and the p-value chance of rejecting the null hypothesis. The p-value is highest only in Europe but is smaller in all other continents Oceania, Africa, America and Asia. The power is the probability of accepting the alternative hypothesis 1 1 :φ φ = H in a hypothetical scenario in which the true value of the regulated level of the air pollution is 1 0.5 φ = . The power is best in Asia and Europe, is reasonable in Oceania but is worst in African and American continents (see Table 3).
The data analytics based statistical results identify the geometric distance among the results regulated, severity, TVaR, memory, shift, minimum and correlation levels of the continents (see Fig. 14) and it confirms the importance of the regulated level in the discussion of air pollution. Consequently, some continents are in close proximity than others with respect to air pollution (see Figure 13, Figure 14 and Figure 15). African continent is alone as an outlier. The American continent is in the opposite spectrum of the Asian continent, while in the middle are the European and Oceania with very similar in the air pollution (see Fig. 15).

Comments and Conclusions
This article has developed new statistical properties of the tapered probability function and demonstrated their utility to analyze and interpret air pollution trend on PM 10 . The derived analytical expressions helped to compare Africa, America, Asia, Europe and Oceanic continents. In particular, the extreme occurrences of the PM 10 in these continents were captured and explained using the tail value at risk (TVaR), a probabilistic measures named hope probability and safety probability, stochastic measure of system's memory level, how much the pollution has shifted on the average and how it controls the volatility of the pollution etc. The maximum likelihood estimators for the severity level, a threshold level and regulated level of the air pollution are obtained and applied in the data to capture and interpret the air pollution proximities among the continents. An expression for the hazard function of the air pollution is derived and demonstrated for all the continents (see Fig. 9).
An expression to quantify the correlation between the estimated the severity level and regulated level of the air pollution has been derived and explained for all the continents. The locally most powerful likelihood ratio test methodology has been developed to examine whether the statistical estimate of the air pollution's regulated level in a continent is significant and to estimate the statistical power of the test in an event having the true regulated level. With all these new expressions, the proximities among the continents with respect to the air pollution. Their graphical illustrations portray the importance of the computed results for the continents. Only the African continent is seen to be outlier compared to the other continents. Special expressions are identified for the Pareto and guaranteed exponential distributions as particular cases of our general results in the article.
All these add to our better understanding the air pollution in cities across the continents. This understanding will assist governing agencies to formulate and implement regulated policies. The environmental professionals can easily mimic the approach of this article in their practice of controlling the air pollution in their jurisdiction. The healthcare professional are helped by the analytic results to better prepare to deal with the health impacts of worst air pollution before it ever occurs. Needless to mention is that the residents who have been suffering in major cities with severe air pollution are helped by the approach and methodology of this article eventually.