Adjustment of Peak Streamflows of a Tropical River for Urbanization

Peak runoff from a catchment is influenced by many factors such as intensity and duration of rainfall, catchment topography, catchment shape, land use and other variables. For a particular catchment, landuse change and other human activitie s will alter the characteristic of catchment hydrograph. Problem statement: As a result of urbanization, the magnitude of floo ds occurring in a catchment increased. It was found that the land use change in the Langat River catchment has clear impact on annual peak streamflow record, particular ly from 1983-2003 while, the change has no significant impact on streamflow record from 1960-1 982. Spatial data confirms the heavy development occurred in river basin from 1983-2003. Thus, urban iz tion makes the historical record of Langat River non-homogenous and this makes the mathematica l simulation for the record inappropriate due to poor expected output. Approach: In this study, historical record of Langat River, Selangor, Malaysia from 1960-2003 was used to study the impact of urba nization on the streamflow. The annual peak streamflow was selected for this purpose. The peak streamflow record was divided into two sets. Set one from 1960-1982 and set two from 1983-2003, whic h represent periods before and after urbanization, respectively. To adjust the set one d ata for urbanization, different adjustment factors were used to make the data homogeneous. Normal mode l was applied to find the best factor for model fitness. Results: The best adjustment factors were selected by trial and error technique based on 95% confidence level. To determine the optimum adjustme nt factors from the best ones, the point of intersection between the homogeneity and Normal mod el evaluation curves was located. This point represented the optimum adjustment factor and its v alue was found to be 1.9. Autorun model was used to validate the above finding and it was found that t e model prediction is acceptable with reasonable accuracy. Conclusion/Recommendations: The proposed method can be recommended to be used for predicting the streamflow of Langat River consideri ng the impact of urbanization on the streamflow. The method also, is applicable to adjust a flood re cord for which flood events had occurred on a watershed undergoing continuous changes in levels o f urbanization to meet both homogeneity and best model fitness.


INTRODUCTION
Streams are affected by runoff from rainfall and snowmelt moving as overland flow and subsurface flow. A major landuse change is the urbanization and characterized mainly by low infiltration rate. The sensitivity of the river floods to landuse change showed to be effectively dependent on the climatic behavior and the geomorphologic characteristics of the river basin [2] . The impact of urbanization is more significant on catchment located in tropical region compared with that located either in arid or semiarid region. Therefore, urbanized catchment in tropical region experiences regular and intense flooding. For instance, referring to a tropical catchment, a 30% urbanization in a basin persuades a 24% increase of the annual mean discharge [3] . Langat River as a tropical catchment is experienced rapid urbanization. The growth of urbanization results in rapid creation of large impervious areas is producing significant problems such as regular flooding due to inadequate drainage facilities. In many studies and despite of the specific nature of the modeling approach, it is most usual to fit a model to the data describing landuse change, verify that model on data and use the models to predict the future [1] . Reliable estimation of discharge is needed for good design of urban drainage systems. Urbanization makes the historical record of Langat River non-homogenous and this makes the existing mathematical models for the historical data unsuitable due to poor estimated output. The literature does not identify a single method considered best for adjusting a flood record or homogenizing the historical data. Each method depends on the data used to calibrate the model. In addition, the databases used to calibrate the methods are very sparse [6] . The objectives of this study are to analyze the change in flow behavior due to the impact of urbanization in Langat River catchment and to propose a methodology for both adjusting streamflow record and calibrate the model parameters from the non-homogenous data. The presented methodology optimizes the adjusted annual maximum streamflow to meet both homogeneity and best model fitness at Langat River and other catchments that are facing similar issues.
Study site, data acquisition and data processing: The area of interest for this study is Langat River catchment at Dengkil, Selangor, Malaysia. This area is located south of Kuala Lumpur, the capital of Malaysia. Hydrometeorologically, the basin experiences two types of monsoons, i.e., the Northeast (November to March) and the Southwest (May to September). The average annual rainfall in the study area is about 2400 mm and the wettest months (April and November) show rainfall amount above 250 mm, while the lowest rainfall occurs in June, about the average of 100 mm [5] . Topographically, Langat basin can be divided into three regions, i.e., the mountainous area in the north, the undulating land in the centre of the basin and the flat flood plain at the downstream of Langat River.
During the last decades, Langat River catchment is subjected to intensive urbanization. Figure 1 shows the degree of urbanization that occurred in the area. The available streamflow record is from 1960 up to 2003 and the record is used for analyzing the impact of urbanization on streamflow increment in Langat River. The historical record exhibited a clear change in streamflow between the period before urbanization and that after urbanization. From 1960-1982, the period is considered as before urbanization while the period from 1983 up to 2003 is selected as the period that witnessed intensive urbanization at Langat River catchment.
The changes in maximum streamflow as a result of changes in the landuse for Langat River catchment were analyzed. Average annual runoff increased due to the decrease in forest area and development in agricultural and urban areas from 1983-2003. Based on confidence interval with t-test, the range of mean for peak streamflows before urbanization  at 5% significance level is calculated.
Before urbanization After urbanization Increment (year)  ( 1983-2003)  percentage  2  113  234  107  5  159  324  104  10  191  385  102  25  231  461  100  50  261  519  99  100  292  577  98 The calculated mean ranges from 99.78-149.4. The mean of peak streamflows after urbanization ) was found to be significantly different from the above-mentioned range. This difference represents nonhomogeneous peak streamflow record. Recorded data after urbanization shows great increase in the values of mean and standard deviation as shown in Table 1. Estimation of extreme flood is a main application of hydrology. The estimated flood is mainly used in design of water resource projects and flood-plain management. Therefore, to illustrate the changes, an analytical attempt to estimate flood frequency using Lognormal distribution that has best fitness, was performed for periods before and after urbanization and the results are shown in Table 2.

MATERIALS AND METHODS
The historical daily record of streamflow for Langat River for a period of 44 years  is used in this study. The annual maximum streamflows are selected from the historical record. For the historical streamflow record, any month with incomplete daily record is considered as a gap. The gaps are filled using linear stochastic model called Thomas-Fiering model. This model is based on the first-order Markov model and represents a set of 12 regression equations. The well-known Thomas-Fiering model equation is described as [11] : Where: X i,j = Predicted discharge for the j th month from the (j-1)th month at time i j Q = The mean monthly discharges during month j S j = The standard deviation monthly discharges during month j a ij = Independent standard normal variable at time i in the jth month r j = The serial correlation coefficient for discharge in the jth month from the (j-1)th month Negative values obtained from applying Eq. 1 are ignored.

Time series model:
Time series modeling is the analysis of a temporally distributed series of data or the synthesis of a model for prediction in which time is an independent variable. An aim of time series and stochastic hydrology models are to produce synthetic streamflow series that are statistically related to observed streamflow series. Statistical similarity involved sequences that have statistics and dependence properties similar to those of the historical record. These sequences signify reasonable future streamflow scenarios under the supposition that the future will be like to the earlier period. The time series model requires the estimation of parameters such as standard deviation, the mean and the serial correlation coefficients that can be determined from the historical records. Eventually, the suitability of the recognized model will be verified by conservative goodness of fit tests.
Simple split-sample testing that requires dividing the available measured time-series data into two sets in order to apply the common framework for testing and building a time series model is used in this study. Based on this method, one set of data is used for calibrating the model parameters and the other set is used for testing the validity of the model by comparing the model with the observed data [4] . As mentioned above, Langat River has nonhomogenous streamflow data and this makes the mathematical replication for the record inappropriate due to poor expected results. In addition, simple split-sample testing is not well recognized as a method for presenting the suitability of a model for simulating the hydrological conditions in watershed, which are undergoing change with nonhomogenous data. Thus, it is important to select a proper method for adjusting streamflow record considering landuse change condition in order to get homogenous record. Figure 2 shows two sets of the historical record for peak streamflow at Langat River.
In the present study, a trial and error method is used to find the optimum adjustment factor for converting the maximum stream record from nonhomogenous to homogenous and model validation. The levels of adjustment and model fitness are verified by using Analysis of Variance (ANOVA) test, which is a statistical test to perform an analysis of variance on data for two or more samples. The analysis provides a test of the hypothesis that each sample is drawn from the same underlying probability distribution against the alternative hypothesis that underlying probability distributions are not the same for samples. When the number of groups equals two, an ANOVA and t-test will give similar results. The result of ANOVA analysis is indicated by the ANOVA Factor (F) and is compared with critical values in different confidence level. Table 3 shows the ANOVA analysis to test the homogeneity of historical record for Langat River before urbanization (from 1960 up to 1982) and after urbanization (from 1983 up to 2003).
The procedure used is to adjust the maximum streamflow record for the period from 1960-1982 (set one) to the present impervious cover conditions from 1983-2003 (set two). For adjusting the set one of historical data, the flow values were multiplied by different modification factors ranging from 1-3.5.  In total, 18 adjustment factors were used and 18 new series were generated. ANOVA analysis was performed for each of the adjusted series in order to verify its homogeneity against set two. For verification, ANOVA factor was determined. The factors, when they are less than 4.07, indicate an appropriate range that can be selected as modification factor at 95% confidence level that can be used for data homogeneity (Fig. 3). After the range of best-adjusted factor was found, a trial and error method was used to find the best modification factor to validate time series models. For this reason, Normal model is used to find the best factor, which represents the best fitness. Based on simple-split method, the calibration of model parameters, standard deviation and mean, were done using the set one of data while model validation was done using the set two of data. Based on 12 different adjusted factors, equal Normal models were applied and 12 series of data were predicted. Each series contains 22 maximum annual streamflow, which is equal to set two of data. Comparison between each pair of model results and existing data (set two), were performed by using ANOVA analysis and 12 various F were determined. The ANOVA factors were plotted versus modification factors, which are shown in Fig. 3. The graphs in Fig. 3 crossed at a point which presents optimum adjustment factor for both data homogeneity and best-fitness factor to run time series models. A flowchart to illustrate applied methodology is shown in Fig. 4.
In a time series model, the data are assumed to be independent, homoscedastic and typically normally distributed. Nevertheless, if the constant variance and normality assumptions are wrong, they are often logically well fulfilled when the observations are transformed by a Box-Cox transformation. The transformations can be stated as either of the following equations [11] : Where: Z i = Transformation of X i series λ = Exponent for transformation X i = Discrete time series value at time i C = Constant for transformation n = Number of data λ can be estimated by trial and error such that the coefficient of skewness of Z is nearly zero. The value of λ lies between -1.0 and +1.0 [10] . It is experiential that an increase or decrease in λ fallout in a systematic increase or decrease in the coefficient of skewness.
The normal distribution is a special case of symmetrical distribution having a skewness coefficient between 0 and σ/n and kurtosis approximately dispersed between 0 and 24/n. where σ is standard deviation. The skewness and kurtosis coefficients for set one of maximum streamflow record for Langat River are determined as 2.08 and 5 respectively. The coefficients characterize the high degree of asymmetry distribution around mean and a relatively peaked distribution. Another standard test is available to verify whether the data is normally distributed. If a historical data is normally distributed, the graph of the cumulative distribution for the data should come into view as a straight line when it is plotted on normal probability paper [11] . Probability plots (P-P plot) test is based on this method, which is generally used to determine whether the distribution of a variable matches a given distribution.
SPSS16 software is used to perform p-p plot test and the result is shown in Fig. 5a that indicates that the distribution of data is not normal. Due to non-normality of data, iteration method was achieved to find constant parameters in Eq. 2 for set one of data of Langat River. Eventually, the following formulae derived to transform maximum streamflow to normal distribution after multiplying the set one of data by selected adjustment factor of 1.9: The transformed streamflow data is tested for normality by using normal distribution coefficients, skewness and kurtosis and P-P plot test. The distribution coefficients derived as 0.020 and 0.86 respectively and P-P plot is shown in Fig. 5b. These coefficients are within acceptable range and the P-P plot shows a graph, which is close to straight-line. Thus, these indicate that the transferred data has good match with normal distribution and can be treated as normally distributed series.
To assess this proposed methodology, it was used to apply Autorun model, in which a series of streamflow was predicted and the results show that generated streamflow and recorded data are in good agreement. Methodology for applying time series models, Normal and Autorun, are discussed below.
Normal model: Normal distribution is a symmetrical bell-shaped Probability Density Function (PDF) of random continuous variable and is given as: The two parameters of the distribution mean, µ and the standard deviation, σ are obtained from sample data. While the data is normalize, the normal distribution can be used as a time series model to predict sequence data. The set one of data of Langat River after adjustment (multiplied by 1.9) transformed to normal mode using Eq. 4. The parameters for Normal model of Langat River streamflow were determined using data from normal mode. Then, the required random numbers between 0 and 1 with mean of 0 and standard deviation of 1 were generated. Selected random numbers were used in Eq. 5 to predicate annual maximum streamflow. Ultimately, by applying back the normal quartile transform (Eq. 4), for the generated data obtained from the Normal model, (Eq. 5), a new set of maximum streamflow data for Langat River is obtained as shown in Fig. 6.
Autorun model: Autorun model is a mechanism for synthetic flow generation, which preserves historical wet and dry spell properties, in addition to the classical statistical parameters of the flow sequence. This model requires identification of a suitable parsimonious model. Usual models are not able to reproduce the historic drought and wet lengths, whiles Autorun model is capable of preserving run lengths [7] . There is real successive dependence to a certain extent between the succeeding events in a single series and crossdependence between two simultaneous series. Persistence refers to the truth that high values tend to follow high values and low values to follow low values. In addition, from a classical point of view, a run is defined as a series of the same kind of observations preceded and succeeded by at least one observations of different type. The truncation of a recorded sequence at a preassigned level provides a quantitative basis for the persistence that is a gauge of the tendency for high flows to follow high flows and low flows to follow low flows [7] . In particular, if x i and x i-k are two dependent measures with joint probability, P(x i ,x i-k ), their provisional probability is indicated by P(xi |x i-k ). Where, k indicates the time difference between the two events and is referred to as the lag, represented that P(x i-k ) is the marginal probability of event x i-k . The following probability statement is applicable between these two probabilities given in Autorun method and describes as [9] : For autocorrection coefficient of the Markovian process, the conditional probability defined as the Autorun coefficient, r k = P(x i | x i-k ) therefore, the equation resume [8] : In addition, the following equation between Autorun coefficient and Autocorrelation coefficient has been derived as [7] : k k 1 1 r arcsin 2 = + ρ π (8) Where: r k = Autorun coefficient ρ k = Autocorrelation coefficient for k-lag The graph of r k vs. k is called the Autorun function. In Autorun model, the run length is a basic parameter to designate the property of runs. Estimate of average lengths of positive (wet) and negative (dry) periods at a given truncation level are calculated as follow and results shown in Where: y = The geometrically distributed and integer-valued random variable dominated a wet and dry period ε = The uniformly distributed random variable between 0 and 1 r 1 = Is assumes the value of p r and n r as required The truncation level is selected, herein, as the median value of the set one of Langat River when it is adjusted and normalized (Fig. 7). Furthermore, the number of high and low flows with respect to the median value will be the same and equal to n/2 for an even number of observations. For transformed data, the median is found to be 1.372, which is equal to streamflow value of 102.82 m 3 sec −1 and corresponded to streamflow record in 1979. Equation 4 is applied back to determine the above streamflow value. Autorun parameters were determined using set one of data (adjusted and normalized). The Autorun model was applied and the model application results were converted to stream flow data using Eq. 4. Figure 6 shows comparison among results from Autorun model, Normal model and set two of data.

Models validation:
The capability of the model to generate trustworthy results is usually assessed through an evaluation of simulated values over a variety of conditions. One of the most commonly used tests is Autocorrelation coefficients that were obtained for both models. The results indicate that simulated data have reasonable independent at 5% significance level.  In addition, validation of models was performed to compare generated data against recorded data. The ANOVA analyses for Normal model and for Autorun model are shown in Table 4. ANOVA factor for Autorun model was determined and found to be 0.54 and it is less then the critical F-value (4.07). This indicates that the goodness of fit for Autorun model is within 95% confidence level. In addition, ANOVA analysis is capable to represents homogeneity of more then two sample of data. In this study, two generated series and historical record for Langat River were tested against each other. ANOVA factor with two degree of freedom was determined and found to be 0.49 that this value is less then the critical F-value (3.14). This result indicates that the generated streamflow (using Autorun and Normal models) and historical record are within 95% confidence level as shown in Table 5.
However, another method, Autorun test, is applied for Langat River [8] . The dependence arrangement in any hydrological variable can be measured by Autorun coefficient rather than the classical correlation coefficient, which has limiting suppositions [9] . The most commonly employed significance level is α = 0.05 which corresponds to t α = 1.645 as normal deviate. Therefore, for 5% significance level, the confidence limits become [8] : If r k lies within the limits, then the hypothesis that a purely random process generates the sequence is accepted, otherwise it is rejected. Application of the Autorun test and its analysis has been performed for annual maximum streamflow of Langat River and the results are shown in Fig. 9.

RESULTS
The effects of urbanization that occurred in the Langat River at Dengkil catchment, Selangor, Malaysia was analyzed. Streamflow data for 44 years is acquired from the Department of Irrigation and Drainage Malaysia. The data is used to investigate whether the values of the recorded streamflow of Langat River is affected by the urbanization and landuse change. It is found that the streamflow record exhibits two distinct periods: one period from 1960 -1982 which is the period before landuse change (actual levels of imperviousness) and the other period from 1983-2003 which experienced heavy landuse change (heavy urbanization). Comparison between mean and standard deviation for these two periods are shown in Table 1.
To illustrate the changes, frequency analysis for periods before and after urbanization were performed and the results are shown in Table 2. ANOVA analysis for homogeneity of recorded data is performed and result is shown in Table 3 as well.
This study presents the application of the proposed methodology for non-homogeneity data in the prediction of floods. The methodology is applied to the river streamflow data at the Langat River with the goal of homogenizing and predicting data. For evaluating proposed methodology, the streamflow of Langat River used for calibrating parameters of Autorun model as a stochastic model (Fig. 6). F-test, Correlation and Autorun test were used in the validation process and results are shown in Table 4, Fig. 8 and 9, respectively. In addition, for verifying the optimization of adjustment-selected factor for both models fitness, an ANOVA analysis with two degree of freedom is performed (Table 5).

DISCUSSION
The changes in peak streamflow, for Langat River catchment were analyzed. As expected, the mean increased, because earlier events occurred when less impervious cover existed and because of such changes in the mean, the exceedance probabilities change. The results of the frequency analysis (Table 2) have highlighted the sensitivity of the flood flow regime in reaction to the occurred landuse change, which involves a considerable increase in the peak discharge particularly for lower return period.
To gather information about the normality of data, the skewness and kurtosis coefficients and P-P plot test for set one of maximum streamflow for Langat River are determined. These coefficients and the results of P-P plot test (Fig. 5a), characterize the high degree of asymmetry distribution around mean and a relatively peaked distribution. Furthermore, ANOVA analysis to verify homogeneity of recorded data concluded high value for F, which is 21.97 compared with critical F for 95% confidence level (4.07), put forward significant difference between two sets of data. The results of these tests and frequency analysis for recorded data for Dangkil catchment as an urbanized catchment, indicates the need for adjusting streamflows due to normalization and homogenization before apply time series model.
Further, the results of frequency analysis show that, for a given return period, flood peak increased significantly after urbanization. This increase can be related to the effect of the forests and rangelands being converted into cultivated and urban areas. The rate of change reduces for higher return periods as expected since the effect of landuse shrinks for higher flood As mentioned earlier the main purpose of this study was to find a best methodology to simulate river streamflow to predict reliable discharge in the location of land use change. The presented methodology ( Fig. 3  and 4) can be used to optimize the adjusted annual maximum streamflow to meet both homogeneity and best model fitness at Langat River and other catchments that are facing similar issues. To optimize the effect of landuse change and best model fitness based on proposed methodology, the data before urbanization in Langat River has been treated (Fig. 3). The successful application of the present methodology requires careful determination of optimal adjusted factor for each catchment. and It should be borne in mind that this factor relies solely on trend of urbanization and the magnitude of selected adjustment factor depend on this trend which is different for each urbanized catchment.
The two lines in Fig. 3 crossed at a point, which presents optimum adjustment factor for both data homogeneity and best-fitness factor to apply time series models. To select the optimum adjusted factor accurately, sufficient variety in the modification factors must apply to obtain sufficient ANOVA factors. The variety in these factors determines the location of this crossed point and the magnitude of the optimal adjusted factor to use in event prediction as shown in Fig. 3.
An accurate and precise evaluation of simulating results is also necessary to ensure the highest possible accuracy. In this study evaluating of model prediction and historical records of the streamflow indicates that the generated streamflow (using Autorun and Normal models) and historical record are within 95% confidence level as shown in Table 5. The results of this study are encouraging and a future application of the procedure might be in evaluating the effect of watershed and climate changes on streamflow and represent an applicable model on non-homogenous data.

CONCLUSION
In this study the effect of land use change on peak streamflow was quantified and a methodology for applying time series model on non-homogenous data is presented. Several findings can be drawn specifically for the Langat River in this study: • Inspection of recorded flood data before and after urbanization in Dangkil catchment, shows that the characteristics of floods occurring in catchment increased 100.3 and 104% for mean and standard deviation respectively • Peak streamflow increased significantly after urbanization. However, landuse change is more effective in increasing peak floods with low return periods. For instance, landuse changes have caused a 107% increase for the 2 years flood magnitude but it is 98% for 100 years return period • A methodology was presented and executed in Dangkil watershed. Based on proposed methodology an adjusted factor is found to be 1.9 to meet homogeneity and best model fitness • In terms of the homogeneity and model verification, the results of analysis show the ability of the proposed method in predicting river flow by applying time series model in watershed with landuse change The results of present study are useful for flood control projects and assessment of flood characteristics of watershed corresponding to landuse change. This method is flexible and easy to use and the simplicity of the information gained by this method is also an advantage. This approach can be constructed based on observed data. The analysis of other case studies, which may refer to different catchments affected by landuse change, by using different time series modeling, could confirm the results of the present study.