The Influence of Soil Characteristics in Low Flows Regionalization

Problem statement: For the purpose of low flows regionalization, relev ant issue for water resources management like environmental flows requi ments definition, this study focused on the controls on the seasonal and spatial variability of q95 (i.e., the specific discharge that was exceeded on 95% of all the time) with particular reference to t he role of soil characteristics, that, like soil in filtration rate, aquifers recharge, evapotranspiration and top ography, usually play a relevant role in low flows seasonality and occurrence within a river. Approach: Piemonte and Valle d’Aosta Regions (NorthWestern Italy) were the investigated study area (30 ,027 km) where 41 catchments were analyzed with the aim of robust regression models enabling the tr ansfer of hydrological information from gauged to un-gauged sites. Results: The regionalization method consisted of multiple re gression models between low flows and catchment characteristics. Twenty-fiv e catchment descriptors were used, checking their relative influence with the multi-regressive proced ure and a special attention was devoted to the selection of significant soil characteristics in th e regionalization process. Seasonality indices were used to classify catchments into two sub-regions and sep arate multiple regressions was performed by checking the prediction performance with cross-vali dation. Also a global regression was fitted out but it yielded a lower performance. In the study domain l nd use, topography and Thornthwaite moisture index demonstrated to be the most significant varia bles in order to represent relationships between catchment soil characteristics and low flows regime . Conclusion/Recommendations: Results obtained in this study were comparable with other regionaliz ation studies carried out in Austria and Switzerlan d. The interpretation of the identified regression mod els provided, at local scale, new tools for water management and environmental flows requirements and , from a wider point of view, useful insights into the general comprehension of low flows process es.


INTRODUCTION
An accurate estimate of low flows is needed in many branches of water resources management, including environmental flows requirements. The literature has many references to techniques for estimating low flows regimes at un-gauged sites. In particular catchments are classified in sub-regions according to physiographic characteristics and transferring flow data between catchments in the same region. The most widely used method is the regional regression approach [9] . For each sub-region this method relates specific low flows indices (e.g., q 95 that is the specific discharge that is exceeded on 95% of all the time) to catchment characteristics through linear or non-linear relationships.
Smakhtin [9] and Demuth and Young [2] give extensive lists of references for regionalizing low flows. In most of these methods, the study domain is divided into sub-regions in which the low flows behavior is assumed to be homogeneous. The identification of these regions is performed by grouping the gauged sites according to a classification criterion and checking the performance of prediction of low flows with crossvalidation. In Piemonte and Valle d'Aosta regions (North-Western Italy) low flows regime is characterized by two different dry seasons (winter and summer low flows). The consequence is a slow depletion of the soil reservoir in accordance with the recession of discharge within the river. In Alpine climate, winter low flows are affected by freezing processes of snow cover or glaciers. Summer low flows, instead, occur during dry periods when evaporation exceeds precipitation and are normally derived from aquifers recharge. Since recent studies have shown its excellence in classification, seasonality is potentially useful for identifying homogenous groups of catchments [3,5,7] . Because of the fundamental differences of summer and winter processes, regionalization may take advantage of a separation into sub-regions based on low flows seasonality.
Gain and losses of stream flow during seasonal dry periods are generated from different natural factors which include soil types and morphological watershed characteristics. Topography, aquifers hydraulics, soil infiltration rate, vegetation distribution and types, geology and climate are basically the most important factors in low flows processes. The driving forces operate on a catchment scale and affect the released discharge from watershed storage [9] . The relevance of different 'gain' and 'loss' processes to the wide soil types, topography and climatic conditions, which exist naturally, is far from obvious to determine.
Using a comprehensive multi-regressive approach (i.e., regional regression) it is possible to select the most influential descriptors for the geographic context of the study domain [11] . Results of this procedure can ensure complete control of the quality of estimates obtained with regression models based on different kinds of descriptors. In this study, several morphologic and climatic attributes of catchments were selected and computed for 41 river basins in North-Western Italy in order to have an accurate regional estimation of low flows indices.

MATERIALS AND METHODS
This study was carried out in Piemonte and Valle d'Aosta regions, which have different orographic and climatic characteristics. In this relatively small area (30,027 km 2 ) the climate changes from the Apenninicmediterranean one, with summer low flows in the south-eastern hills, to Alpine-continental one in the Alps mountain range, characterized by winter low flows regime. In this territory catchments areas between 21 and 1,800 km 2 , elevations range from 106-4,725 m and mean annual precipitation from 841 mm in South-Eastern hills to a maximum of 2,113 mm Northern mountainous areas, were considered. For this reason, in this territory an analysis of soil characteristics influence in low flows regime is both complex and interesting. Figure 1 shows the spatial distribution of stream gauges considered in this study, while Table 1 lists, for each of them, the related catchment area, the Mean Annual Runoff (MAR) and the specific discharge q 95 .
Discharge data: For discharge data, the reliability of stream flow data series was taken into account: Dams presence in the upstream part of the catchment, a minimum of 10 years of daily stream low registration and relevance of abstractions or karts effects during the low flows periods were the most important factors used in the gauge choice. For instance, gauges located on the plains were excluded owing to flow alterations due to the presence of irrigation abstractions and reservoirs in the upper stream part. Coming from 41 stream gauges, daily discharge data series between 1942 and 1975 were used in this study. As a reference of low flows regime the low flows index q 95 (i.e., Pr(q>q 95 ) = 0.95, the discharge exceeded on 95% of all days of the measurement period),  was selected because of its large use in literature and its relevance for multiple topics of water resources management [9] . q 95 is standardized by the catchment area, (L sec −1 km −2 ) in order to make it more comparable across scales, expressing characteristic unit runoff. All selected catchments cover a total area of more than 12,000 km 2 , which is about the 40% of the entire study domain.

Catchment characteristics:
The choice of the catchment descriptors used in low flows regionalization depends largely on the availability and quality of the data. Demuth and Young [2] analyzed the frequency of different categories of catchment descriptors in 120 low flows estimation models and assessed that the 73% of all catchment characteristics used are drainage basin parameters, 22% are climatic parameters and 5% are hydrological parameters. For the drainage basin parameters, 46% are morphometric descriptors, 17% surface cover, 10% geology and 10% soil characteristics. In this study, for low flows regionalization, 25 morphoclimatic watershed characteristics were used. These descriptors give synthetic information of the shape of basin surface, the nature of the soil and vegetation, the topography and climate ( Table 2). Due to the low spatial accuracy of digital information, geology parameters were not considered and land use and runoff curve number were used as geological surrogates [4,3] . Also the Thornthwaite moisture index, runoff curve number and the drainage density were included in the analysis because of their relationship with geology, soil infiltration rate and vegetation type distribution. Morphometric parameters of drainage basins and river networks were computed using GIS tools and 'R' statistical software was used for computation of statistical indices [11] . Some of the catchment characteristics had to be adapted to make them more useful for regionalization. For instance, it is possible to condense the original Corine land cover classification into 5 land use classes. Drainage basin descriptors were divided into different categories explained using a capitol letter: catchment area A, elevation H, physiographic slope S, orientation O, watershed parameters W, land use L, climatic parameters C. Table 2 shows a summary of these catchment characteristics. Among these, the following are the less common catchment descriptors [11] .
Starting from physiographic slope S a mean slope invariant from DEM resolution (S inv ) was considered, defined as: Where: A = The catchment area H med = The median elevation H min = The elevation of the closing section This is a slope measure of a square equivalent basin which does not account for basin shape and whose definition is objective.
For catchment shape factors W SF and W CR were chosen. The first one expresses the ratio between the catchment area and the square longest drainage path length, while W CR is the circularity ratio between drainage basin area and the area of a circle having the same perimeter. In the land use section L, the runoff curve number (L CN ) is an empirical parameter used in hydrology for predicting direct runoff or infiltration from rainfall excess. L CN is based on the hydrological soil groups, land use, treatment and hydrological moisture condition. The curve number method was developed by the USDA Natural Resources Conservation Service, which was formerly called the Soil Conservation Service (SCS). In literature L CN is also known as 'SCS runoff curve number [10] .
For climatic parameters C, the Thornthwaite index (C IT ) and the Budyko index (C IB ) were considered. C IT is a global moisture index that can be estimated as the ratio: Where: ET 0 = The mean annual potential evapotranspiration on the basin C IB = Instead, is a radiational aridity index expressed as: Where: R n = The mean annual net radiation λ = The latent vaporization heat Values assumed by C IB are lower than 1 for humid regions and greater than 1 in arid regions [11] .
Regional regression analysis: The regional regression analysis was performed in two step: The first is to divide the study domain into sub-regions in which the low flows behavior is assumed to be homogeneous, while the second is to build a multi-regressive model that relates the q 95 (i.e., dependent variable) to morphoclimatic descriptors (i.e., independent variables) in order to select the most influential descriptors for low flows regionalization.
Recent studies have shown the reliability of seasonality indices method for classifying catchments in order to divide the study domain. Laaha and Blöschl [6] investigated four catchment grouping strategies when developing multi-regressive models to estimate low flows indices in Austria. Due to the large difference in low flows occurrence a catchment grouping based on seasonality gave the best performance. Engeland and Hisdal [3] compared a regional regression model with a regional rainfallrunoff model in Norway for regionalizing low flows. In the regional regression they used seasonality indices to divide the territory into sub-regions. Also in that case regression method generally gives better estimates of low flows in un-gauged catchments.

Seasonality indices:
In order to investigate the low flows seasonality three indices, as in Laaha and Blöschl [7] , were used.
The first one is the Seasonality Ratio (SR) which expresses the difference of summer (q 95s ) and winter (q 95w ) low flows. SR is defined as: s w q95 SR q95 = From April 1st to November 30th daily discharge time-series are considered as winter discharges and from December 1st to March 31st as summer discharges. Values of SR>1 indicate the presence of a winter low flows regime and values of SR<1 indicate the presence of summer low flows regime (Fig. 2a).
The second seasonality parameter is composed by two indices θ and r [7,12] . These represent the seasonal distribution of low flows occurrence. The parameter θ is a circular statistic. Its values range between 0 and 2π, With θ and r the mean day of occurrence and the intensity of seasonality can be displayed by using a vector map (Fig. 2b).
The third seasonality index is expressed with seasonality histograms based on a monthly scale. Columns represent the frequency of discharges below the threshold q 95 over time (Fig. 2c). Regions with approximately homogeneous seasonality are shown in Fig. 2d. Group 1 includes the Alpine region, characterized by winter low flows and Group 2 is referred to the Apenninic-mediterranean climate where low flows normally occur during summer.

Low flows indices estimation:
Discharge data used in this study were daily discharges. To make the low flows characteristics more comparable across scales, q 95 was standardized by the catchment area (q 95 [l s −1 km −2 ]). The types of linear models investigated were divided in four classes: Where: x i = The morphoclimatic descriptors β i = Regression coefficients The analysis of low flows data series showed that distribution of q 95 and its transformations resulted in approximate normality. For the estimation of the coefficients β i the Ordinary Least Squares technique [8] was used. For all regression models, a combination of all morphoclimatic variables was attempted, satisfying the three general assumptions: The absence of multicollinearity between β i , the homoscedasticity (Var[res i ] = const.) and the unbiasedness (E[res i ] = 0, guaranteed by the OLS procedure) of residuals, where res i is the residual for catchment i. All the models for which at least one of the independent variables resulted to be non-significant according to the Student t test at a 95% significance level were discarded. The descriptive power of each regression was assessed through the adjusted coefficient of determination R 2 adj , defined as: For each class the multi-regressive model based on the best performances in terms of R 2 adj , with the lower RMSE cv (the best model) and with the use of the most commonly-available parameters (the simplest model) was chosen. Finally the selected model was then checked with respect to the assumptions underlying the regression analysis.
Multi-collinearity affects the OLS procedure determining large variances and co-variances for the least-squares estimators of the regression coefficients. A simple statistic to measure the presence of multicollinearity is the Variance Inflation Factor (e.g. [8] ): Where: R 2 j = The coefficient of determination obtained when the independent variable x i = Regressed on the remaining p-1 regressors Practical experience indicates that if any of the VIFs exceeds 5 or 10, the associated regression coefficients are poorly estimated because of multicollinearity.
In the end normality of residuals is required for hypothesis testing (the significance t test). To detect non-normality and heteroscedasticity of the residuals, they were plotted respectively on a normal probability plot and against the fitted values, in order to recognize if they display particular patterns.
As a first approach, one global regression model was fitted to all 41 catchments. In the second step, corresponding to the original classification of catchments obtained by seasonality indices analysis, regionally restricted regression models were each fitted for the two contiguous regions (Fig. 2d).

RESULTS
Best regressions were chosen on the basis of the criteria previously discussed considering all the possible linear regression models. Table 3 outlines the best regressions obtained for each model classes, along with R 2 and RMSE CV statistics. Chosen multi-regressive relationships and their performance are reported in Table 4, which also outlines two measures of model performance, the coefficient of determination R 2 cv and the root mean squared error RMSE CV . Both were obtained from cross-validated residuals and, therefore, are representative of the prediction of low flows in ungauged catchments.   Fig. 3: Diagnostic plots of residuals. Each column refers to one regression model (global and sub-regional models) Global regression showed a relative performance of R 2 cv = 60%, corresponding to RMSE CV = 2.336 L sec −1 km −2 . Grouping catchments into two sub-regions and separate regressions using seasonality indices improved the overall model performance to R 2 cv = 72% and RMSE CV = 1.1931 L sec −1 km −2 .
Normality and homoscedasticity of residuals are desirable properties if one is interested in interpretable estimates of model performance. In this study, model assumptions (i.e., normality of residuals and heteroscedasticity) were carefully checked by three diagnostic graphs: Scatter plots of observed versus predicted values, residual plot as a function of observed values and normal probability plots of residuals. Diagnostic graph derived by the regression results are showed in Fig. 3. Each column corresponds to one regional regression model and each point to one catchment. Proceeding by rows, the first two graphs allow a detailed examination of the performance of individual catchments, including the existence of outliers and a potential heteroscedasticity (diversity in variance) of the residuals. In the graphs the outliers do not tend to increase with q 95 and the residuals of the three models can be considered homoscedastic. In the context of this study, results of this check are positive for all regression models, as the main focus is on evaluating the influence of soil characteristics on low flows regionalization.
The third graph is the representation of crossvalidated residual in normal probability plot. Only single extreme outliers appear and residuals can be considered approximately normally distributed. In order to check for multi-collinearity, the VIF factor was computed for all the regressors considered. Factor values ranges between 1.02 and 2.65 for all descriptors considered in regression models. In all cases VIF is much below 5, value indicating possible multicollinearity.

Relative importance of catchments descriptors:
A more detailed analysis of Table 3 and 4 yields the relative importance of predictor variables in the context of low flows regionalization. Starting from the global regression model, parameters importance consists of three catchment characteristics. Percentage of catchment area above 2000 m (A 2000 ), mean slope (S) and the proportion of crops and grassland (L CG ) demonstrate to be the most significant variables for the transfer of hydrological information. Especially for the geographic context under exam, L CG (i.e., proportion of cropgrasslands) appears in all selected regression models and it is the most influential descriptor for land use.
Grouping into two regions and separate regressions in each region led to two similar regression equations in terms of considered descriptors. Models exhibit quite the same parameters. Land use, river length and Thornthwaite index appear in both models as the most influential descriptors. The two different sub-regional models differ because of changes in selected regressors. Particularly, in the regression models of Group 2, A 2000 and L CG are replaced by similar descriptors H max (maximum elevation) and L F (proportion of forest).
In the context of this study very little influence on low flows appears to exist for soil characteristics such as runoff curve number, stream network density and portion of urbanized areas, rocks and wetlands. They are never selected in the regression equations.

DISCUSSION
Soil characteristics and especially land use demonstrate to be significant variables for the regionalization of low flows indices. It is interesting to compare this result with studies in the literature using similar catchments characteristics and examined q 95 specific discharges.
The frequency of the catchment descriptors used depends largely on the availability and quality of the data. In literature catchment descriptors affecting low flows controls are comprehensively provided by Smakhtin [9] . In that study it is possible to find an overview of catchment descriptors used in regional estimation models. Catchment area, mean annual precipitation, channel and/or catchment slope, stream frequency and/or density, percentage of lakes and forested areas, various soil and geology indices, length of the main stream, catchment shape, watershed perimeter and mean catchment elevation are the morphoclimatic characteristics most commonly used.
In this study 25 catchment descriptors were used, checking they relative influence with the multiregressive procedure. In the resulted regional regression models, only 7 catchment characteristics were selected. Moreover, as shown in Table 2, regression models contained similar parameters and most of them occurred for different regression classes. The overall regression model yielded a lowest performance (R 2 cv = 60%). For this global model three catchment characteristics were selected as predictors: A 2000 (catchment area above 2000 m), S (mean catchment slope) and L CG (proportion of cropgrasslands). In Alpine climate A 2000 is related to the catchment topography and consequently to the special influence in snow cover and glaciers formation. These processes characterize the whole Alps mountain range in Northern Italy and regulate the hydrological response during winter low flows periods. Smakthin [9] give an extensive list of low flows studies related to the influence of glaciers and snow pack. Freezing and melting processes include a decrease in runoff variation and, consequently, more sustained low flows. The mean slope S generally has a positive effect on low flows [7] and it is possibly correlated with storage volume in high mountains.
Grouping catchments into two sub-regions had the best performance (R 2 cv = 72%). Separate regressions using seasonality indices in catchment classification were carried out. Catchment relief was represented also in separate grouping equations by A 2000 or maximum altitude (H max ). Hence, altitude and slope (catchment topography) appears as an important control of low flows in the study domain. Other catchment characteristics that appeared into sub-regional regressions are W RL (river length), L F (proportion of forest) and C IT (Thornthwaite moisture index). River length relates to the longest drainage path and its interactions with aquifers. Engeland and Hisdal [3] find main river length as a reference for winter low flows regime in Norway.
Land use is significant for all selected models in the study domain. It affects evapotranspiration and soil infiltration rate which are related to water losses in low flows discharges during dry periods. The proportion of forest, crops and grasslands (L F and L CG respectively) are the land-use characteristics which more frequently appear in regressions. In contrast with Laaha and Blöschl [7] portion of rocks and wetland were never selected and did not appear to be important controls. On the other hand, Aschwanden and Kan [1] within a study concerning the low flows regionalization for Switzerland, outlined that land use plays an important role in predicting low flows indices, especially considering the characteristics proportion of agricultural areas and pre-Alpine farming structures.
The climatic parameter C IT (the Thornthwaite index) expresses the ratio between mean annual precipitation and evapotranspiration. It relates to the catchment global moisture as a reference of the average water balance. It was selected in the two separate regression models, supporting the finding that Thornthwaite index is one of the most important controls of low flows in North-Western Italy.
Catchment area, orientation parameters, watershed shape factors, runoff curve number and Budyko aridity index were never selected in the regression equation, so very little influence on low flows appears to exist. According to Laaha and Blöschl [7] , also stream density W DD (parameter related to geology and land use) never occurred in the regression equations and does not appear to be a significant indicator of low flows for the study domain.
Results obtained in this study are therefore comparable with regionalization studies in literature, especially in Austria and Switzerland. We believe that the interpretation of the regression models provides useful insights into the comprehension of low flows processes. Topography, Thornthwaite moisture index and land use conditions appear to be the most influential parameters for regionalizing low flows in North-Western Italy.

CONCLUSION
Gain and losses of stream-flow during seasonal low flows periods are generated from different natural factors that include morphoclimatic descriptors of catchments. The objective of this study was the examination of the relative influence of soil characteristics in regionalizing low flows. In this study, the specific discharge q 95 is considered as an index of low flows regime with regard to its estimation in un-gauged basins within the Piemonte and Valle d'Aosta regions (North-Western Italy). Regionalization has a primary importance in water resources management in order to transfer hydrological information from gauged to un-gauged sites. Multiple regression analysis were used and tested in order to establish a relationship between catchment characteristics and the low flows index q 95 . Based on the results, using regression method, catchment grouping based on the dominant low flows seasonality was an effective method for obtaining homogeneous sub-regions in North-Western Italy where winter low flows and summer low flows are controlled by different processes. The best results in this study were obtained with an overall predicting performance of R 2 cv = 72%. Topography, Thornthwaite moisture index and land use conditions are the best index for determining low flows estimates for ungauged catchments. According to other studies in literature, it can be affirmed that these kinds of analyses are useful in order to assess the relative importance of different kind of descriptors. Moreover, the interpretation of soil and land use parameters using regression models provides useful insights into the comprehension of low flows processes.