Bayesian Changepoint Analysis of the Extreme Rainfall Events

Problem statement: This study assesses recent changes in extremes of annual rainfall in Peninsular Malaysia based on daily rainfall data fo r 50 rain-gauged stations over the period 1975-2004 . Approach: Eight indices that represent the extreme events ar e defined and analyzed, which are extreme Dry Spell (XDS), extreme Rain-Sum (XRS), ex treme wet-day intensities at 95% and 99% percentiles (I95 and I99), proportion of extreme r ainfall amount to the total rainfall amount (R95& R99) and frequency of extreme wet-day at 95 and 99% percentiles (N95 and N99). Bayesian approach based on a single shifting model is used to investi gate the change in the mean level of these extreme rainfall indices. The detection on whether the chan ge has occurred or not is analyzed followed by the estimation of the location of change point. Results: The results of the analysis showed that half of th e stations considered displayed significant changes. The analysis also found that in general, the change s occurred in the early 90s. More than 75% of the sta tions which recorded significant changes are situated on the west coast of the peninsula. Conclusion/Recommendations: The west coast of Peninsular Malaysia displays more significant chang es in trend especially at stations located in urban areas compared to the east coast of the peninsula. In terms of the Bayesian methods used, the existenc e of any outlier in the data series may influence the result since the analysis is based on mean value which is very sensitive to any outlier.


INTRODUCTION
The phenomenon of extreme precipitation events, which include extreme rainfall and extremely long spell of dry days (drought), are among the most disruptive of atmospheric phenomena. These events may cause significant damage to agriculture, ecology and infrastructures, disruption towards daily activities, accidents and loss of life. In Peninsular Malaysia, the phenomenon of unpredictable rainfall events which increases in frequency lately has brought about damages costing millions of Malaysian ringgit. The increase in massive flood cases including flash flood and landslides in the recent 10 years is believed to be due to the increase in rainfall intensities. On the other hand, prolonged dry condition has forced the local authorities to impose water rationing, resulting in negative impact to daily life. Apart from that, in agricultural areas, crops can suffer damage from excess rainfall as well as extreme dry spells.
In this context, this research provides an insight to the possible change in the rainfall extreme and extreme dry spell for the past 30 years as measured by 8 extreme indices. This is because, any changes in extreme rainfall trend brings great implication to engineering, insurance, town planning and any activities that assumed that climate has been stable for the last few decades. For example, the design of drainage, bridge, retaining wall and dam systems depends on the expected rain amount received during certain time duration. Earlier research on the changes in these extreme rainfall indices at the same area using data from only 8 rain-gauged stations can be found from Zin et al. (2010). In their article, the extreme rainfall indices derived from rainfall data at these 8 rain-gauged stations for the period of 35 years have been analysed for any change in trend. The statistical methods used in that study is classical approach where the changes in trend were tested using Mann-Kendall test and linear regression while the change detection points were identified using Pettitt test.  All the 50 stations, numbered from 1 to 50, are located at various places throughout Peninsular Malaysia, as shown in Fig. 1. These stations could represent the overall trend for extreme rainfall for the peninsular. We consider the data from 1975 to 2004 because this is the longest period for which a complete set of data is available for all stations. As described Hosking and Wallis (2005), the problem of availability of a large set of data is not uncommon when the analysis is based on the annual series. This situation is common in developing countries where long records are often unavailable; however, studies need to be done for various planning purposes such as for construction of infrastructures.
Peninsular Malaysia experiences a tropical climate due to its location with respect to the equator and the influence of monsoon seasons. It lies in the equatorial zone, situated in the northern latitude between 1°N and 6°N and the eastern longitude from 100°E and 103°E. Throughout the year, the peninsular experiences a wet and humid condition with daily temperature ranges from 25.5-35°C. The two monsoons that contribute to rainy seasons are the Southwest monsoon, occurring in May until September and the Northeast monsoon which occurs from November until March. The latter monsoon brings about heavier rainfall in the peninsular, with the worst affected areas in the east and south. In between these two monsoons are the inter-monsoon periods, occurring in March-April and September-October, bringing intense convective rain to many areas in the peninsula. On the other hand, the driest period for the peninsula usually occurs during the Southwest monsoon with the northern part, on the average, experiencing relatively long spells. Reports on potential water rationing are common during the dry seasons. Apart from the two monsoons, the weather in Peninsular Malaysia is also influenced by two climatic phenomenon, known as El Nino and La Nina which are believed to cause abnormal weather conditions. In Malaysia, El Nino phenomena, which causes drier condition, has occurred 12 times with the worst in 1982-83 and 1997-98. The worst weather condition caused by La Nina in this country is in 1998-2000, resulting in an increase in the daily rainfall amount.

Methodology:
In this research, the change point analysis focuses on the detection of change in the mean level of the time series data and the location of the change point. Several classical methods in change point analysis include the non-parametric Wilcoxon test, the Student-t test and the sequential Mann-Kendall test. The alternative for these classical approaches is Bayesian approach which takes into consideration prior information, the model of the shift assumed and observed data into forming a posterior distribution to model the associated analysis. The statistical inferences on the unknown parameters with regards to change point location can then be made based on this posterior distribution. Bayesian approach in change identification problem for mean level in time series data have been used by previous researchers such as by Smith (1975), Lee and Heighinian (1977), Booth and Smith (1982), Perreault et al. (2000a;2000b) and Kim et al. (2009). The Bayesian approach in this study is based on a single shifting model and takes into consideration non-informative prior distributions on the unknown change point.
In contrast to the classical approach, the Bayesian methods take into consideration the parameters of a model as random variables represented by a statistical distribution (prior distribution) rather than fixed values. The Bayesian methods allow the integration of statistical analysis through the prior distribution with the most current information based on the observations into a posterior distribution. In other words, the prior distribution reflects beliefs about the parameters prior to experimentation; the posterior distribution provides an updated belief about the parameters after sample data is obtained. The analysis involved includes getting the mean values before and after the shift, the amount of change and the variation in observations. In this study, two related problems will be analysed that is on the detection of the change point and the estimation of the change point.
There are several definitions used to describe the extreme precipitation events. In this study, eight extreme indices were examined based on daily rainfall data at 50 stations. Some of the indices chosen are standard extreme precipitation indices as defined by The Expert Team on Climate Change Detection, Monitoring and Indices (ETCCDMI).
The ETCCDMI indices considered in this study are maximum dry spell, maximum cumulative rain-sum, extreme intensities, extreme frequencies and extreme proportions (Table 1). A wet day is defined as a day with a rainfall amount of at least 1 mm. The dry spell index calculates the longest dry spell (rainfall amount less than 1mm) recorded each year. The extreme rain-sum index measures the greatest cumulative rainfall amount received during a wet spell in a year. This index is considered as flooding usually occurs when infrastructure is unable to accommodate the amount of excess water during prolonged and continuous rainy days. The 95 th and 99 th percentiles are selected as the thresholds to represent extreme events. Extreme proportion measures the proportion of annual rainfall amount to the total amount of annual rainfall received. Extreme frequency is a count of rainfall events per year which equal to or above the long-term  mean of 95 th and 99 th percentiles. We will refer to rainfall exceeding 95 th percentile as very wet days and exceeding the 99 th percentile as extremely wet days.
Consider a set of hydrological time series The parametersτ, µ 1 , µ 2 and σ 2 represent the change point, the mean before and after the shift and the variance of the series, respectively. The prior distribution of µ 1 and µ 2 are assumed to be the same Normal distribution, denoted by Eq. 3 and 4: ( ) With large 2 0 σ , these distributions will approach the non-informative prior distributions. The variance of the series σ 2 , is assumed constant and estimated by Eq.  Dry spell (XDS) The maximum cumulative total rainfall collected during a wet spell in a year (mm) Extreme Rain-Sum (XRS) Average intensity of events greater than or equal to the 95th i.e. average four wettest events (mm) Extremely wet day intensity (I99) Percentage of total rainfall from events greater than or equal to the 95th percentile (%) Very wet day proportion (R95) Percentage of annual total rainfall from events greater than or percentile, i.e. average eighteen wettest events (mm) Very wet day intensity (I95) Average intensity of events greater than or equal to the 99th percentile, equal to the 99 th percentile (%) Extremely wet dayproportion ( After a sample of time series X is observed, the posterior distribution of the mean levels µ 1 and µ 1 can be determined using Bayes theorem Eq. 6 and 7: n .
x , n n n .
x , n n n n n .
The likelihood function for τ, µ 1 and µ 1 can be derived using this formula, Using the Bayes theorem, the posterior distribution of the change point location, τ is Eq. 8: The justification on whether a shift has occurred or not can be checked using Bayes factor, B Eq. 10: p is a constant such that 0 p 1 ≤ ≤ .
The calculation in this procedure may not be expressed in a simple form but it can be estimated by using Monte Carlo Markov Chain approach. Table 2 shows the results on the trend identification using Bayesian method at 90% confidence level. Stations with significant shift in mean of extreme indices are indicated with bold letters in respective boxes for increasing trend and shaded boxes for decreasing trend. It is found that half of the stations studied showed significant trend for at least one extreme rainfall index with station 26 (Rumah Pam Paya Kangsar) and 40 (Bukit Berapit) having the most significant changes in the indices that is as much as 5 indices with significant change in trend. XDS index is found to be decreasing significantly while majority of other indices showed significant increase. The next part of the analysis that is the identification of the change point location is also carried out. Bayesian analysis to detect the change point location is performed on all stations with significant shift in trend as listed in Table 2. As an example, the results for station 40, which is one of the two stations with most number of significant changes in extreme indices, are displayed in Fig. 2. For each pair of graphs, the top graph displays the time series plot of the extreme index data while the graph at the bottom of the pair shows the posterior probability plot for change point that is the probability that the change point occurs at a particular point. The largest value represents the point when the shift in trend is most likely to occur. Table 2 also shows the year when the shift in trend for each station with the related indices. In terms of year when the changes occurred, it is found that most changes occurred at the end of 1980s to early 1990s. Despite that, there are several indices showing that changes occurred at the initial period when data was taken (before 1980) that is for XDS (station 11), I95 and I99 (station 30), R95 (station 11), R99 (station 27) and N95 (station 32). This may implies that the Bayesian method is more sensitive towards any change in the data although the number of data may be short.  Station  XDS  XRS  I95  I99  R95  R99  N95  N99  2 Nevertheless, the existence of data with relatively larger value compared to other data (outlier) may influence the analysis results as this analysis is based on the change in mean value that is rather sensitive towards any outlier in data.

RESULTS AND DISCUSSION
In terms of climatology, the two extreme El-Nino events occurring in 1982/83 and 1997/98 may be the contributing factor to the change in climate as detected by the Bayesian change point detection test. Apart from that, majority of the stations located at the west coast of the peninsula experience significant changes in the studied indices. In general, these areas experience rapid development in late 1980s to early 1990s. This factor may contribute to the obvious climate change compared to the east coast of the peninsula. Shaharuddin (1992;2004) discovered that there exist effects from Urban Heat Island at big cities which influence the temperature change and directly cause an increase in rainfall intensity at these areas. This can be seen clearly at stations located in Selangor and Federal Territory (stations 17, 18 and 22).

CONCLUSION
As a whole, the west coast of Peninsular Malaysia displays more significant changes in trend compared to the east coast of the peninsula. The increase in significant trend at urban area as seen at station 17, 18 and 22 for extreme cumulative rainfall amount, extreme intensities and extreme frequency need to be viewed with caution as this area is a highly populated area. Many factors such as rapid township development, industrialisation, increase in the number of vehicles and population may influence the pattern of rainfall for this area where the change in trend is found to begin in the 1980s to early 1990s. Apart from that, climate phenomena such as El-Nino and La-Nina may play important role in determining the weather pattern in Peninsular Malaysia In terms of the Bayesian methods used, the existence of any outlier in the data series may influence the result since the analysis is based on mean value which is very sensitive to any outlier. This situation may cause the Bayesian change point analysis to show significant change although in fact the other points in that respective station are actually consistent.