Modeling the Distribution of Rainfall Intensity using Hourly Data

Problem statement: Design of storm water best management practices to control runoff and water pollution can be achieved if a prior knowledge of the distribution of rainfall characteristics is known. Rainfall intensity, particularly in tropical climate, plays a major role in the design of runoff conveyance and erosion control systems. This study is aimed to explore the statistical distribution of rainfall intensity for Peninsular Malaysia using hourly rainfall data. Approach: Hourly rainfall data were collected from twelve stations spread across the Peninsular. Six hour separation time was used to divide the data into individual rainfall events and four probability distributions namely, Generalized Pareto (GP), Exponential (EXP), Beta (BT) and Gamma (GM) distributions were used to model the distribution of the hourly rainfall intensity. Kolmogorov-Sminov anderson-Darling and Chi-squared goodness-of-fit tests were used to evaluate the best fit. Results: The rainfall frequency, based on 6 h minimum inter-event time, ranges from 115-198 events. The distribution of the rainfall frequency and that of the highest intensity observed, over the recorded period, across the peninsular, is however irregular. The mean rainfall intensity ranges from 2.32-3.88 mm h −1 . Kuala-Lumpur and Penang received the highest, while Segamat and Kedah received the lowest. Conversely, over the period of record, Segamat recorded the highest CV, skewness and kurtosis while Pahang has the least value for these parameters. Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of rainfall intensity in Peninsular Malaysia. However, GP is found to be the most suitable model among the four probability distributions tested. Conclusion: Basic statistics of hourly rain intensity were obtained and probability distributions compared. It was found that GP is the most suitable model. Results can be useful, particularly, to agricultural and storm water management planning.


INTRODUCTION
Precipitation is caused, in the atmosphere, as a result of rising and cooling air masses. In Malaysia, rainfall is formed by two major wind systems; the North-east monsoon, which occurs from November to March and the South-west monsoon which occurs from May to September. Rainfall occurs throughout the year but the North-east monsoon brings in more rainfall. The country has an average annual rainfall of more than 2000 mm, while humidity ranges from 70-90%. The highly frequent precipitation experienced affects the quantity and quality of surface runoff and ground water, which are essential for human existence. Rainfall pattern affects mans activities in so many ways and as such, designs of agricultural, storm water management, telecommunication, erosion and sediment control systems are highly dependent on rainfall characteristics. High intensity rainfall, particularly if sustained over a long duration, is mostly responsible for altering the geomorphology of a watershed and therefore, knowledge of the distribution of rainfall intensity is important in the design of erosion control and runoff conveyance systems. The aim of this research is to explore the statistical distribution of rainfall intensity for Peninsular Malaysia using hourly rainfall data.
The spatial characteristic of rain rate was developed from a rainfall data in a tropical rainfall measuring mission. Gamma and Log-normal distributions were used to model the rain rate and results indicate that both the models fit well with the PDF of rainfall data (Cho et al., 2004). Similarly, parametric family of PDFs were derived alongside Gamma and Log-normal PDFs for comparing the best fit in modeling rain rate in Darwin and Florida. Result indicates that the Log-normal distribution out-performs both the Gamma and the other family of PDFs (Kedem et al., 1994). Two parameter gamma distribution, alongside Markov model were used to describe the distribution of daily rainfall at two sites in Ghana and to predict the occurrence of the rainfall in order to develop a rainfall simulation model; using twenty years of rainfall data; the research confirms the applicability of the gamma distribution (Adiku et al., 1997). Continuous hourly rainfall data were derived from rainfall rate recordings at Jardi gauge, Barcelona, Spain. The result shows that the duration of rainfall is Exponentially distributed, that of rainless interval is Generalized pareto and the cumulative rainfall in the cumulative rain duration is Beta distributed (Burgueno et al., 1994).
Long term rainfall data at hourly resolution is uncommon in developing countries and modeling of rainfall characteristics for Peninsular Malaysia is mostly conducted using the available daily rainfall data. Deni and Jemain (2008) compared Mixture of geometric distribution with truncated poisons distribution alongside five other distributions namely, Geometric distribution, Compound geometric distribution, Geometric log series distribution, Log series distribution, Modified log series distribution and Truncated negative binomial distribution for modeling the sequence of wet days in Peninsular Malaysia. Result shows that all the data fits the Mixture of geometric distribution with truncated poisons distribution successfully. Similarly, the distribution of dry and wet spell was fitted with probability models namely, Mixture of Log-series distribution, Mixture of log series poisons distribution and Mixture of log series geometric distribution. Chi-square goodness of fit test was used and results indicate that the Mixture of log series geometric distribution and Mixture of log series poisons distribution showed a better fit (Deni and Jemain, 2009). Several types of exponential distributions were used to model daily rainfall data for Peninsular Malaysia and results indicated that mixture of distributions are better in modeling daily rainfall data (Suhaila and Jemain, 2007). Eight probability distributions namely Two parameter gumbel and Gamma, Three parameter generalized normal, Generalized pareto, Generalized extreme value, Pearson type 3, Log-Pearson type 3 and the 5 parameter wakeby were tested using Probability plot correlation coefficient test combined with root mean squared error, relative root mean squared error and maximum absolute deviation and it was concluded that the Generalized extreme value distribution is the most appropriate distribution for describing the annual maximum rainfall series in Malaysia (Zalina et al., 2002).

MATERIALS AND METHODS
Data collection and analysis: Precipitation data, consisting of rainfall depth, recorded at hourly intervals were collected from the Department of Irrigation and Drainage, Malaysia. Twelve locations spread across different geographical regions of Peninsular Malaysia were selected for the data collection. The data, which covered period of 10-22 years, was examined and missing records were removed. Figure 1 shows the map of Peninsular Malaysia and locations of the data collection.
Six hour storm separation time was used as Minimum Inter-Event Time (MIT) to separate the data into individual rainfall events. This follows discussions by Guo (2002); Adams et al. (1986) and Adams and Papa (2000) for storm water management applications. Rainfall events separated by less than 6 h are merged and considered as single event. This gives an average annual number of 163 events for the Peninsular based on the 6hr MIT. The rainfall intensity was obtained by dividing the rainfall depth with the duration from the beginning of the rainfall.
Distribution of rainfall intensity: Generalized pareto, Exponential, Beta and Gamma distributions were used to model the distribution of the rainfall intensity. Empirical cumulative distribution function was determined using the equation: The Probability Density Functions (PDF) and Cumulative Distribution Functions (CDF) for the four models are given as follows: Note that x is the random variable representing the hourly rainfall intensity.
Generalized pareto distribution: The Generalized pare to distribution with continuous shape parameter (k), continuous scale parameter (σ) and continuous location parameter (µ) has a PDF and CDF given by: Where: x for k 0 µ ≤ ≤ +∞ ≥ x / k for k 0 µ ≤ ≤ µ − σ < Exponential distribution: The One parameter exponential distribution with scale parameter (λ) is represented by the PDF and CDF given by: where: λ>0.
Beta distribution: The PDF and CDF of Beta distribution with continuous shape parameters α1 and α2 is given by: Where: B = A Beta function I Z = The Regularized incomplete beta function a and b = Boundary parameters (a<b) Gamma distribution: The Two parameter gamma distribution with continuous shape parameter (α), continuous scale parameter (β) is represented by the PDF and CDF given by: Where: Γ = The gamma function Γ x = The Incomplete gamma function Goodness-of-fit tests: Three goodness-of-fit tests were conducted at 5% level of significance. Note that X denotes the random variable and; n, the sample size.
The tests are as follows: Kolmogorov-Smirnov (K-S) test: This test is used to decide if a sample comes from a hypothesized continuous PDF. It is based on the largest vertical difference between the theoretical and empirical CDF. For a random variable X and sample (x 1 , x 2 ,........x n ) the empirical CDF of X (F x (x)) is given by: where, I (condition) = 1 if true and 0 otherwise, Given two cumulative probability functions F x and F y , the Kolmogorov-Smirnov test statistics (D + and D -) are given by:

Anderson-Darling (A-D) test:
The A-D test compares the fit of an observed CDF to an expected CDF. It gives more weight to the tail of the distribution and the test statistic (A 2 ) is given by: Chi-squared (C-S) test: This test simply compares how well the theoretical distribution fits the empirical distribution PDF. The C-S test statistic is given by: Where: o i = The observed frequency for bin i Ei = The expected frequency for bin i K = The number of classes E i is given by: and x 1 and x 2 are the lower and upper limits for bin i. Table 1 presents the result of basic rainfall intensity statistics for the twelve stations. The goodness-of -fit test ranking result is shown in Table 2. Similarly, Fig. 2 and 3 present probability difference plot between the empirical CDF and the theoretical CDF. Figure 4 and 5 show the GP distribution fitted on the histogram of rainfall intensity.    3 Note: Ranking is in the order of 1,2,3,4. 1 is the best ranking and 4 the worst ranking; NA: implies not applicable; GP: Generalized Pareto, EXP: Exponential; BT: Beta and GM: Gamma Distribution

DISCUSSION
The rainfall frequency, based on the 6 h MIT, ranges from 115-198 events. Result indicates that Khota-Bahru has the lowest value of 115 while Kuala-Lumpur, the highest value of 198 events per annum. The distribution of rainfall frequency as well as that of the observed highest intensity, over the recorded period, across the peninsular is however irregular. Over the same period, rainfall of as high as 69 mm h −1 was observed at a station in Segamat while a closer station in Kluang recorded less than half of this amount. The mean rainfall intensity ranges from 2.32-3.88 mm h −1 . Kuala-Lumpur and Penang received the highest, while Segamat and Kedah received the lowest. Conversely, over the period of record, Segamat recorded the highest CV, skewness and kurtosis indicating the spread and peakedness of rainfall intensity in the station while Pahang has the least value for these parameters.
K-S, A-D and C-S goodness-of-fit tests at 5% level of significance indicate that all the models can be used to describe the distribution of rainfall intensity in Peninsular Malaysia. However, GP is found to be the most suitable model among the four probability distributions tested. EXP distribution comes second while BT and GM distributions come third and fourth respectively. Therefore, GP is recommended as the best model.
The probability difference plot shown in Fig. 2 and 3 is calculated as the difference between the empirical CDF and the theoretical CDF given by the equation: The low intensity events occur at higher frequency and therefore have higher probability difference as compared to the higher intensity events. Also Fig. 4 and 5 displays the GP model (i.e., the best model) matched on the histograms of rainfall intensity for Segamat and Penang stations.

CONCLUSION
Basic statistical characteristics of hourly rainfall intensity for Peninsular Malaysia have been obtained using hourly data recorded at twelve stations spread across the Peninsular. Four probability distributions, namely, Generalized pareto, Exponential, Beta and Gamma distributions were tested to model the distribution of the hourly rainfall intensity and Kolmogorov-Sminov anderson-Darling and Chi-squared goodness-of-fit tests were used to evaluate the best fit at 5% level of significance. GP is found to be the most suitable distribution for modeling the hourly rainfall intensity. Based on these findings, it is recommended as the best model for describing the hourly rainfall intensity.