The use of Principal Component Regression and Time Series Analysis to Predict Nitrous Oxide Emissions in Ghana

Corresponding Author: Benjamin Odoi Department of Mathematical Science, University of Mines and Technology, Tarkwa, Ghana E-mail: bodoi@umat.edu.gh Abstract: The disturbing pace of emanation of Nitrous Oxide (N2O) into the atmosphere and its calamitous impact on the environment, as monitored by many governmental agencies and researchers has become a wellspring of worry for many nations and therefore needs due attention. The study deployed nitrous oxide emissions from the three sectors to the total nitrous oxide emissions in Ghana over the period 1990 to 2016. The sectors, energy sector, Agriculture Forestry and Other Land Use (AFOLU) and Waste sector were considered against the total N2O emissions. Principal Component Regression (PCR) was applied to the input variables for the reduction of its large size to a few principal components to explain the variations in the original dataset since there was the presence of multicollinearity. Autoregressive Integrated Moving Average (ARIMA) was used to develop models to predict the total N2O emissions and emissions from the sectors in Ghana. The appropriate models that fitted the data well were ARIMA (1,2,1) and ARIMA (1,1,2) based on information criteria (AIC, AICc and BIC). The ARIMA (1,2,1) model was found to be the most suitable model for predicting N2O emission from Energy sector and Waste sector. 70% Of the dataset was used for the analysis and the results from the forecasted values mimic the original dataset. It was revealed that the AFOLU sector is the predominant sector that significantly contribute the overall N2O emission in the atmosphere based on standardized coefficient. The model was adequate since its MAPE for AFOLU sector and the total N2O emissions were 2.95 and 2.68% respectively, meaning the model explained 97.05 and 97.32% respectively. The predicted values mimic the trend of the current situation at hand.


Introduction
The planet's continuing rise in temperature is really upsetting. This is root caused by global warming. Global warming is the long-term heating of Earth's climate system observed due to human activities, primarily fossil fuel burning, which increases heat-trapping greenhouse gas levels in Earth's atmosphere. The clouds, water particles, reflective ground surfaces and ocean surface then send back into space about 30% of sunlight, while the rest is absorbed by seas, air and land (Jacobson, 2014). Consequently, this heats up the planet's surface and atmosphere and makes life possible. As the earth warms up, the thermal radiation and infrared rays radiate this solar energy, propagating it directly out into space and thus cooling the earth down (Eppelbaum et al., 2014).
However, some of the outgoing radiation is reabsorbed into the atmosphere by nitrous oxide, carbon oxide, water vapour, ozone, methane and other gases and is radiated back to the surface of the earth. Because of their heat-trapping capacity, these gases are commonly known as greenhouse gases (Shahzad, 2015).
Global warming has remained a topic of discussion and a debatable issue among politicians and the scientific community ever since it emerged in the early nineteenth century (Berlie, 2018). The special Eurobarometer 2009 report adds that the world's most serious problems at the moment include global warming, poverty and international terrorism. But most Europeans respond that global warming is by far the most serious challenge compared with any other threat.
N2O is recognised as the most important ozone depleting substance (Ravishankara et al., 2009). Nitrous Oxide (N2O) is the third most prevalent GHG, behind Carbon Dioxide (CO2) and methane (CH4). Since the early 1990 s, the concentration of this gas in the atmosphere has steadily increased and has an atmospheric life of 121 years. Nitrous Oxide (N2O) also has a potential for global warming 300 times that of carbon dioxide over a 100-year timeframe (Griffis et al., 2017).
In the year 2016, the total national greenhouse gas emission in Ghana was 42.2 MtCO2e (based on nitrous oxide, carbon dioxide, methane, hydrofluorocarbon and perfluorocarbon). Nitrous oxide was the second largest greenhouse gas for that year, constituted about 18.3%. Nitrous Oxide (N2O) is a very stable substance in the atmosphere and for several decades the emission can influence global atmospheric concentrations (Ogeya et al., 2018;Tiemeyer et al., 2016).
In fact, the findings of a recent scientific analysis show that nitrous oxide is the leading ozone depleting agent currently released. Legislation to restrict nitrous oxide emissions could therefore contribute to both protecting climate change and recovering ozone.
This study aims to use historical empirical data to examine various economic sectors that contribute to N2O emissions in Ghana and make future predictions using ARIMA model to help Ghana government implement different policies and strategies to limit nitrous oxide emissions within its borders.

Some Related Literature
The following are some related works considered under this study.
Nyoni and Bonga (2019) predicted Carbon Dioxide (CO2) emissions in India using Box-Jenkins ARIMA approach over the period 1960 to 2017 and established that the ARIMA (2,2,0) is the best fit model for predicting CO2 emissions in India. They also found out that CO2 emissions in India are likely to increase and thereby exposing India to climate related challenges.
Nyoni and Bonga (2019) used ARIMA in modeling and forecasting carbon dioxide emissions in China. They found out that ARIMA (1,2,1) is the optimal model for forecasting carbon dioxide emissions in china and also CO2 emissions in china are likely to increase and thereby exposing china to plethora of climate change related challenges. Rahman and Hagan (2017), using forty-four-year time series data from 1972-2015 based on ARIMA models, revealed that the ARIMA model (0,2,1) is the best model for carbon dioxide modelling and prediction in Bangladesh. Hossain et al. (2017) forecasted carbon dioxide emissions in Bangladesh using Box-Jenkins ARIMA technique over the period 1972-2013 and identified that the ARIMA (12,2,12), ARIMA (8,1,13) and the ARIMA (5,1,5) are the best fits models for forecasting CO2 emissions from Gaseous Fuel Consumption (GFC), Liquid Fuel Consumption (LFC) and Solid Fuel Consumption (SFC) rather the other methods of forecasting Holt-Winters Non Seasonal (HWNS) and Artificial Neural Networks (ANN) models. Ismail and Abdullah (2016) combined Principal Component Regression (PCR) and Back-Propagation Neural Networks (BPNN) techniques in order to improve the accuracy of the electricity demand prediction rates. Mendeș (2009) used multiple linear regression models based on principal components scores to predict slaughter weight of broiler. Thupeng et al. (2018) used Principal Component Regression (PCR) technique to predict a day in advance the daily maximum 1 h average ambient ground level ozone concentration for Maun town. Mishra and Vanli (2016) used principal component regression for extracting damage sensitive features of a lamb wave sensor signal and establish a relation between the features and measured areas. Rahayu et al. (2017) used principal component analysis to reduce multicollinearity of the currency exchange rate of some countries in Asia, period 2004-2014. Ganiyu and Zubairu (2010) developed a predictive cost model using principal component regression for public building projects in Nigeria. Haque et al. (2013) developed principal component regression by combining multiple linear regression and principal component analysis to forecast future water demand in the Blue Mountains, water supply systems in New South Wales, Australia. Lall et al. (2016) used principal component regression model for predicting acceleration factors for copperaluminum wire bond, subjected to harsh environments. Sousa et al. (2007) used multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations. The aim of their study was to predict next day hourly ozone concentrations through a new methodology based on feedforward artificial neural networks using principal components as inputs. They found that the use of principal components as inputs improved model prediction by reducing complexity and eliminating data collinearity. Asare et al. (2018) used principal component regression method to predict the water level of the Akosombo Dam.

Methods Used
The general multiple linear regression model with response Y and predictors X1,…,Xn will have the form of Eq. (1): where, 0 is the intercept point of the regression line and y-axis, 1, 2…n are the regression coefficients associated with X1, X2…Xn respectively. Each coefficient measures the effect of the corresponding predictor after taking account of the effect of all other predictors in the model and  is the error.

Assumptions in Multiple Linear Regressions
Some assumptions are needed in the model Y = X +  for drawing the statistical inferences. The following assumptions are made: i.
These assumptions are used to study the statistical properties of estimators of regression coefficients.

Multicollinearity
Multicollinearity occurs when two or more predictors in a regression model are moderately or highly correlated with one another. Predictors multicollinearity occurs when your model includes multiple factors that are correlated not just to your response variable, but also to each other. This problem is more troublesome at smaller sample sizes, where the standard errors are usually larger due to sampling error. Multicollinearlity can be tackled by applying some multivariate techniques like principal component regression, factor analysis and so on. However, this study makes use of the principal component regression.
Suppose that we have a random vector X: with population variance-covariance matrix:   Each of these can be thought of as a linear regression, predicting Yi from Xi,X2,…Xp. There is no intercept, but eip, ei2,…,eip can be viewed as regression coefficients. Note that Yi is a function of our random data and so is also random. Collect the coefficients eij into the vector:

First Principal Component (PCA1)
The first principal component Y1 is the linear combination of X-variables (among all linear combinations) with maximum variance. It represents as much variation as possible in the data. Specifically, we define coefficients e11, e12,…,e1p for the first component in such a way that its variance is maximized, subject to the constraint that the sum of the squared coefficients is equal to one. This constraint is required so that a unique answer may be obtained.

Second Principal Component (PCA2)
The second principal component Y2 is the linear combination of x-variables, which represents as much of the remaining variation as possible, with the limitation that the correlation between the first and second components is 0. Select e21, e22,…,e2p select that maximise the variance of this new component: subject to the constraint that the sums of squared coefficients add up to one: along with the additional constraint that these two components are uncorrelated: All subsequent principal components have the same property-they are linear combinations accounting for as much of the remaining variation as possible and are not correlated with the other principal components.
Same procedure is carried out for each additional component for instance.
Therefore, all principal components are uncorrelated with one another. The variance for the i th principal component is equal to the i th eigenvalue of matrix var (Y):

Box-Jenkins Model Approach
This time series forecasting is a step by step approach which apply ARMA or ARIMA to find the best fit of time series model to past values of a time series data (Box et al., 1994). The basic steps in Box-Jenkins methodology are: i. Differencing the series to achieve stationarity ii. Identification of tentative model iii. Estimation of the model iv. Diagnostic checking of the model; and v. Using the model for forecasting

Preliminary Analysis of Data
The descriptive statistics from the data and correlations existing among the variables considered in the study are displayed in Table 1. In Table 1 the average values (means), the deviations from the mean (standard deviations), the minimum and maximum value for each of the variables considered in the study have been presented.
From Fig.1, there exists a structure in the data with one general class of relationship, thus positive (blue). It is observed that Waste sector and Agriculture, Forestry and Other Land Use (AFOLU) sector are positively related to each other.
To ascertain the dangers or consequences associated with Multicollinearity and as well validate the need to employ dimensional reduction (Principal Components). All the three sectors are regressed on the total nitrous oxide emissions. The result of the regression analysis is provided in Table 2. As observed, the p-value from the F-test (0.000), shows that the model is statistically significant (adequate). The adjusted R-squared indicates that about 100% of the total variability of total nitrous oxide is accounted for by the model. Also, two of the variables have Variance Inflation Factor (VIF) value greater than 5, hence an indication of multicollinearity in the model. Therefore, an application of a direct Multiple Regression Analysis produced inaccurate results for interpretation and thus is normally called spurious regression. In order to solve the multicollinearity problem and perform a reliable regression analysis, Principal Component Analysis (PCA) is employed to help eliminate the level of multicollinearity in the dataset. Principal Component Analysis will also help identify appropriate variables (Principal Components) to be used as independent variables.

Formulation of Principal Components
The first step in formulating the principal components is an estimation of the correlation matrix of the independent variables considered in the study. You can also use Bartlett's test of Sphericity and Kaiser-Meyer-Olkin (KMO) test to determine whether performing principal component is necessary. Table 3 shows the Bartlett's test of Sphericity and KMO test. The p-value for the Bartlett's test of Sphericity (P-value <0.05) meaning the variables are not orthogonal (correlated) and overall Measure of Sampling Adequacy (MSA) of KMO (0.51>0.5). Collectively, these tests suggested that dataset is appropriate for Principal component regression. Table 4 is the correlation matrix of the independent variables considered in the study. The correlation matrix reveals that there is a strong correlation between the waste sector and AFOLU sector. Table 5 contains information regarding the three possible principal components and their relative explanatory power as expressed by their eigenvalues. As expected, the component solution extracts the components in the order of their importance. Principal Components 1, 2 and 3 completely explains 72.01, 97.93 and 100% respectively of the dataset. Two criteria are evaluated in order to decide on the number of factors to retain. These are latent roots and the proportion of variance explained. Using the latent root and proportion of variance explained criteria, one component is retained, which explains about 72.01% of the dataset. Figure 2, which is the Scree plot also reveals that one component must be retained since it is the first component whose eigenvalue is greater than one. Table 6 presents the principal component eigenvector. Column is the loading for the one principal component extracted with respect to each variable.
The result in Table 7 shows the varimax rotation for the components model. The results for the rotation are easy to interpret. One main component was extracted to represent the three components it shows that, one variable correlate well with component one and that variable is the waste sector. Table 8 shows the principal component regression, after the extraction of the one main component, the eigenvectors for the one component was used as repressor for the regression analysis. The F-statistic was statistically significant at 5% significance level (F = 502.7, p-value = 0.0000). Also, the Adjusted R-Squared was approximately 95% to show how much the component can be explained on the dependent variable. The estimated Principal Component Regression (PCR) that fits the data gathered is given as:

Time Series Analysis
Time Series Analysis was analysed based on the three sectors and total nitrous oxide. Test for stationarity was performed to apply the method used on the three sectors and the total nitrous oxide. The energy sector and waste sector appear not to be stationary whilst AFOLU sector and Total N2O appear to be stationary for the first differencing based on the KSPS, ADF and PP tests as shown in Table 9. However, after the second differencing the energy and waste sectors were stationary as shown in Table 10.

Model Selection
The formulation of the ARIMA models was based on the information triggered by the ACF and the PACF.
Based on the first difference achieving stationarity for AFOLU and Total N2O sectors, the ACF plot shows an autocorrelation at lag 1 which exceeds the significance bound, but all other autocorrelation is below the significance bound whilst the PACF shows that the partial autocorrelation at lag 1 exceeds the significance bounds. Clearly, from the plots, AR and MA terms can be identified. Since the ACF plot of the first difference cut off after lag1, MA (1) can be assumed.
The PACF plot of the first difference tails off after lag 1, so AR (1) can be assumed. Hence, mixed model ARIMA (1,1,1) is formed by combining the AR and MA terms.
Again, the energy and waste sectors achieved stationarity at second differencing hence, AR and MA terms was identified. Since the ACF plot of the second difference cut off after lag 1, MA (1) was assumed. Likewise, the PACF tails off after lag 1, thus AR (1) was also assumed. Hence, mixed model ARIMA (1,2,1) is formed by combining the AR and MA terms. Figures 3  and 4 show the ACF and PACF for the energy sector at second differencing.
After model identification, the need arises to select a model based on the reliability of prediction. Three Information criteria (AIC, AICc and BIC) were considered for the model selection. The thumb rule is that the best model is the one with the minimum information criteria. It was revealed from the analysis that, ARIMA (1,2,1) was the model that best fits Energy sector and Waste sector while ARIMA (1,1,2) was the model that best fits AFOLU sector and the Total N2O. Table 11 shows the model selection criteria used to select a good predictive ARIMA model.
The estimated parameters and the best fitted models based on the selection criteria for the Energy, AFOLU, Waste and Total N2O sector is shown in Table 12 and Eqs. 19 to 22.
The fitted model for the Energy sector will be expressed as: the fitted model for the AFOLU sector will be expressed as: the fitted model for the Waste sector will be expressed as: the fitted model for the Total N2O will be expressed as:

Diagnostic Checking
For correlation on the standardized tests, the Ljung-Box Test was used. The hypothesis states that residuals are not correlated (null) and residuals are correlated (alternative). It was deduced that the p-values for all the sectors were greater than 0.05, hence the null hypothesis is not rejected and conclude that residuals are not correlated, as shown Table 13. This implies that the models are adequate.

Conclusion and Recommendation
The variance that was explained by the one main component was 72.01% as indicated Table 5. The approach presented here is efficient and appropriate for classification of nitrous oxide emissions that make up the total nitrous oxide emissions in Ghana. After the classification, the eigenvectors were regressed on the total nitrous oxide emissions and the result show that PC1 has a significant impact on the total nitrous oxide emissions. This means that, when there are more nitrous oxide emissions in the Waste Sector, it will have significant impact on the total nitrous oxide emissions and from the standardized coefficient, it was observed that Agriculture Forestry and Other Land Use (AFOLU) sector is the major contributor of overall nitrous oxide emission, followed by Waste sector and Energy sector. The study also provided an appropriate model for predicting N2O emissions from the three sectors and the annual total nitrous oxide emissions in Ghana. Findings of the study have established that ARIMA (1,2,1) is the best fitted model for predicting N2O emissions from energy and waste sector while ARIMA (1,1,2) is the best fitted model for predicting N2O emissions from AFOLU sector and the annual total N2O emissions in Ghana. The models were deemed accurate for prediction based on their small Mean Absolute Percentage Error (MAPE) values. It is expected that N2O emissions from the three sectors and the total N2O emissions will continue to increase.

Recommendation
In order to curb high nitrous oxide emissions from Agriculture, Forestry and Other Land Use (AFOLU) sector, it is recommended that nitrogen-based fertilizer application should be reduced, minimum tillage for cropping and reducing emissions from livestock as well as modifying a farm's manure management practices. It is also appropriate for policy makers to put in plays some mechanism to control the emissions of nitrous oxide from the other two sectors thus the Energy sector and Waste sector since as indicated in this study as a key factor of the increase in nitrous oxide.

Author's Contributions
Benjamin Odoi: Participated in all experiments, coordinated the data-analysis and contributed to the writing of the manuscript. Coordinated the mouse work. Disigned the research plan and organized the study.
Lewis Brew: Coordined the mouse work. Designed the research plan and organized the study.
Christopher Attafuah: Designed the research plan and organized the study.

Ethics
We hereby state that this study is the authors' own original work, which has not been previously published elsewhere. The results are appropriately placed in the context of prior and existing research. All authors' have been personally and actively involved in substantial work leading to the paper and will take public responsibility for its content.