Comparative Analysis of the Efficacy of Landslide Susceptibility Models

Corresponding Author: Gabriela Guimarães Gouvêa de Oliveira School of Civil Engineering, UFJF Federal University of Juiz de Fora, Juiz de Fora, Brazil Email: gabriela.oliveira@engenharia.ufjf.br Abstract: This research aims to analyze the efficacy of SHALSTAB, IPT, SAGA, SMORPH and modified SMORPH models in evaluating landslide susceptibility. Statistical analysis concerning both the efficacy of the models in the prediction of landslide risks and the concordance between the models according to landslide occurrence or nonoccurrence was performed. For this work, logistic regression, Receiver Operating Characteristic (ROC) curves, Kappa statistic and concordance analysis were used considering a sample of 15,544 incidents reported during the period of 1996 to 2012 in the city of Juiz de Fora, Brazil. The analysis included 855 confirmed landslide occurrences and 14,689 unconfirmed occurrences. The need for the addition of new variables other than those included in the susceptibility analysis models was observed by the analysis of the historical ballast of occurrence. In many cases where SHALSTAB, IPT, SAGA, SMORPH and modified SMORPH models pointed to a low possibility of a landslide, many landslides of great significance occurred, which included casualties. The importance of this study is to assess the efficacy of these models through the indication of new complementary variables. The results show that an anthropogenic variable is necessary as slopes with similar geotechnical characteristics are submitted to different demands compared to natural conditions.


Introduction
In the last decades, urbanization has increased all over the world, mainly in developing countries. Nowadays half of the planet's population lives in urban areas. Cities have become the main expression of the environmental modification made by human beings and they represent the troubled relationship between man and nature.
In Brazil, this process is not different, especially after the intense urbanization that occurred in the post-World War II period. Most Brazilian cities expanded without adequate urban planning, resulting in the emergence and/or worsening of several environmental and social problems. However, environmental problems do not affect the entire urban space homogeneously in terms of distribution and intensity and are often related, in the Brazilian case, to the most undervalued areas whose physical spaces are occupied by underprivileged social classes. Socioeconomic and human losses are expected to increase because climate change will probably result in a higher frequency of landslides caused by rainfall increase and by social pressure, which drives people towards sloped areas (World Bank, 2010).
The expansion of urbanized areas, characterized by the waterproofing of large occupied surfaces, directly affects the capacity of water infiltration into the soil, increasing superficial drainage and therefore changing the entire operation of the hydrological cycle in cities, with important consequences for the population residing there. Episodes of intense rainfall, characteristics of tropical regions, add to the intensification of the disordered occupation of sloped areas, the removal of vegetation cover, the misuse, mishandling and poor conservation of soil and contribute significantly to increase urban environmental problems related to mass movements.

91
Landslide susceptibility models are powerful tools for providing better knowledge of the dynamics of mass movements in urban areas located in tropical regions. These models may significantly help in identifying areas most prone to the occurrence of this phenomenon by using different techniques, such as the logistic regression applied in this study. The results from these models may also contribute to the better organization of the urban space.
Logistic regression is known to be one of the most suitable methods for evaluating landslide susceptibility (Hosmer et al., 2013). Several comparative studies demonstrate the superior performance of this technique as compared to others (Guzzetti et al., 2005;Mathew et al., 2009;Rossi et al., 2010;Vorpahl et al., 2012;Akgün, 2012;Felicísimo et al., 2013).
The use of this multivariate technique to model landslide susceptibility is mainly due to its capacity to operate with any kind of independent variable (ratio, interval and ordinal or nominal scale), regardless of the deviation of predictors considered and of the residuals of a normal distribution. All discrete independent variables are binarized and transformed into dichotomous or polychotomous variables. The dependent variable is defined as a binary variable in terms of a stable or an unstable state of the mapping unit to be classified (Costanzo et al, 2014).
The application of these techniques requires a series of comprehensive multi-temporal data that is not always available. The limitations of the databases described in literature refer to the integrality of the historical series, the exact location of time and space effects, the uncertainty about the number of people involved and the reliability of the sources (Petrucci and Pasqua, 2008;Petrucci and Pasqua, 2009;Jaiswal et al., 2010;Petrucci and Pasqua, 2012). The best way to verify the validity of a model is by using it in areas that are different from the ones it was designed for (Jaiswal et al., 2010).
One of the reasons why risk assessment and mapping still involve challenges is the lack of time data on landslide occurrence (Pellicani et al., 2014). In other countries, such as Portugal, the works of (Bateira, 2001;Teixeira, 2005;Pereira, 2009;Pereira et al., 2012) indicate that landslide susceptibility models based exclusively on physical variables were still not applied and they report models based on mountain geomorphology and mechanical and hydrological information available on the surrounding massifs.
Several existing studies point to the need to understand the assessments of landslide susceptibility and the relevance of models that produce effective results. In this sense, this research proposes to measure and compare the efficacy of the SHALSTAB, IPT, SAGA, SMORPH and modified SMORPH models to the real occurrence data, thus verifying the degree of efficacy between the landslide implied by the models and the actual risk. This way, whether the variables of these models respond properly to the landslide phenomenon may be also checked. Based on the results, it will be possible to identify the existence of false positives indicated by the models and if there is a historical dependency between the data of landslide occurrence or non-occurrence.

Materials and Methods
Our approach involved three steps: characterization of geoenvironmental context, data organization and application of landslide susceptibility models and comparative analysis of the efficacy of landslide susceptibility models (Fig. 1).

Study Area
The city of Juiz de Fora is located in the state of Minas Gerais, Brazil, where strong urban expansion over the past decades and the expressive population growth led to the occupation of steeply sloped areas, as well as to the consequent increase of mass movements. The city's population, which was of 169,440 inhabitants in 1960, reached a total of 516,247 inhabitants in 2010 and the urbanization rate, which was of 74. 46% in 196046% in , reached 98.86% in 201046% in (IBGE, 2010.
The study area corresponds to the eastern region of Juiz de Fora City, traditionally characterized by the occurrence of a large number of landslide events. Figure 2 presents this region, which is one of the most affected areas by the problem of mass movements in recent years. The Civil Defense reported in this area a total of 855 mass movement events during the period of 1996 to 2012, frequently followed by human and 92 material losses. Thus, the uneven topography, the presence of steep slopes, the high amount of summer rainfall, the sparse vegetation cover and the intense and disordered occupation of these areas by underprivileged social classes all constitute a framework that requires improved knowledge of these dynamics, in order to develop a series of measures of urban space management and planning that may mitigate the consequences of mass movements. The eastern area of Juiz de Fora is composed of 15.62 km² of urban area and 10.24 km² of rural area. It is a significantly large area, between 694.9 m and 1,043.4 m. Most slopes gather on plane areas with high slopes up to 35°C. However, there is a significant area with slopes greater than 45°C.
About 60% of the slopes are less than 10 m high and only 20% in the eastern area are higher than 20 m. This represents a high potential for landslides.

Data Organization and Application of Landslide Susceptibility Models
Between 1996 and 2012, 855 georeferenced landslide occurrences were identified in the eastern region of Juiz de Fora. All registered cases were properly assessed and reported by an expert civil engineer in the geotechnical area.
From aerial photogrammetric surveys and laser profiling, both executed in 2007, it was possible to perform the digital terrain modeling with 1 m pixel resolution throughout the study area.
The municipal government already had a landslide susceptibility map, created using the SAGA/UFRJ model (Silva, 1990), in which four risk categories were described (low, medium, high and very high). Such categories were defined according to the IPT model (Carvalho et al., 2007).
From the SHALlow STABility Model, the model SHALSTAB merges the hydrological model to infinite slope stability. Dietrich and Montgomery (1988) first developed this model at the beginning of the 1990's at the University of California at Berkeley.
The SHALSTAB model is presented in Equation 1 (Silva et al., 2013) defining for each pixel the landslide susceptibility: Where: The Instituto de Pesquisas Tecnológicas (IPT) model considers the forecast for landslides based on slope, slope curvature and soil state (soil-free, talus fragments and soil fractions altered areas).
The SAGA/UFRJ is a Geographic Information System (GIS) for environmental applications to analyze conventional georeferenced data supplementing reports and maps to support decision making actions.
The SMORPH model, which stands for Slope MORPHology (Shaw and Johnson, 1995), developed in the United States for forecasting superficial landslides, is an empirical model adapted to include the contributing area with the creeping process. This model only uses parameters derived from a Digital Elevation Model (DEM) to calculate susceptibility.
SMORPH requires slope and curvature thresholds as inputs to define the surfaces with regard to landslide potential. Slope thresholds needed to be calculated for five slope ranges: Relatively flat, low steepness, moderate steepness, high steepness and extremely high steepness.
Slopes with convex curvature tend to dispel the surface water, thereby not allowing an aquifer to form and making the development of pore water pressure under the soil difficult, which contributes to the instability of the slopes. Figure 3 presents the curvature types: slopes with flat, convex and concave shape make it easy for water to concentrate due to the surface and undersurface water accumulation. The modified SMORPH model, SMORPHM, proposes an evaluation of susceptibility for landslides by considering the possibility of Talus soil (a slope formed especially by an accumulation of rock debris at the base of a cliff).
The environmental module has three basic geographical functions: • Signature: Is used to define the hypsometrical characteristics for specific areas selected by the user that allows the identification of important variables • Modification control: Is the continuous inspection of environmental phenomena through sequential time mapping • Evaluation: Is the superposition of maps used to combine weight and scale valuation for developing the potential risk factors in a new map Many data combinations may be developed this way and the maximum value of digital map distribution for environmental change may be produced and analyzed. This maximal contribution automatically allows the evaluation of all other class of map for maximum value. The algorithm of this evaluation is presented in Equation 2 (Silva et al., 2013).
The distribution of occurrences was classified in a binary way as confirmed landslide occurrences and non-occurrences (when no landslides were identified). Out of a total of 15,544 incidents reported in the system by population request between 1996 and 2012, only 855 were confirmed occurrences and 14,689 represented non-occurrences (i.e., occurrences that were reported but not confirmed when the Civil Defense staff visited the site).
The word repetition refers to repeated landslides in the same area. Based on the Civil Defense's database, the confirmation concerning occurrences and nonoccurrences regarding the reported events was verified by means of an engineer's report, which is generated after the location is surveyed. The engineer's reports are essential to confirm occurrences. After analyzing the georeferenced reports, we could identify the distribution of confirmed landslide surveys, named "occurrences", as well as the surveys that pointed to the: Where: A ij = Georeferenced base pixel n = Number of maps P k = Weight of each map "k", divided by 100 N k = Scale valuation for each map class

Comparative Analysis of the Efficacy of Landslide Susceptibility Models
Using a Geographic Information System (QGIS® version 2.18) crosschecks were performed with the susceptibility models, from records on landslide and non-landslide areas.
Data regarding confirmed landslide occurrences in the eastern region of Juiz de Fora were annually grouped and are presented in Table 1, followed by the number of occurrences per year in the period of 1996 to 2012.
An analysis of the data enabled us to identify a distribution of repeated occurrences in the same location, but the data presented in Table 2 did not indicate interference in the aims of this research on models for predicting landslide occurrences.
So-called "non-occurrences" (i.e., cases in which the landslide occurrence was not confirmed).
Repetition of landslide occurrences was identified in the same geographic location during the analysis of the database; however, such repetition was not considered statistically significant in regards to the focus of this study.   Logistic regression models (Kleinbaun and Klein, 2010) were developed to estimate the landslide risk according to diagnoses performed by the SHALSTAB, IPT, SAGA, SMORPH and modified SMORPH models.
The statistical analysis consisted of two main steps: (i) Analyzing the efficacy of models for predicting the risk of slope failure; and (ii) analyzing the concordance between the models according to landslide occurrence or non-occurrence.
In order to complement the analysis, Receiver Operating Characteristic (ROC) curves were built. These provide a graphical method of evaluation, organization and selection of diagnostic and/or prediction systems. A concordance analysis between the models was performed via Kappa statistic by considering all pair combinations of models.
The ROC curves were obtained for the final model of landslide prediction by a binary classification, that is, two classes designated as positive and negative. The analyzed ROC curves results were relevant in the process of comparing the predictive efficacy of each model used in this study.
The independence between observations in all analyses was considered; however, spatial statistical models can be adopted to incorporate georeferenced information, thus modeling the possible dependence between occurrences.

Analysis of the Efficacy of the Models in Predicting Occurrences
The analysis of the efficacy of the SHALSTAB, IPT, SAGA, SMORPH and modified SMORPH models was developed individually at first. It was based on diagnoses of data from each model concerning landslide occurrences and non-occurrences, generating positive and negative results for efficacy of the occurrence data (Kleinbaun and Klein, 2010). Figure 4 to 8 presents the application results of the susceptibility model for soil landslides for the eastern region of Juiz de Fora according to the five different models utilized. For the development of the cartographic base, the geodesic reference was SIRGAS 2000, the Geocentric Reference System of the Americas 2000. And the coordinate system used was the Universal Transverse Mercator (UTM).
From a color scale ranging from green to red there is, respectively, the indication of low to high susceptibility to the occurrence of landslides. The blue lines identify the water courses present in this region.
It may be observed that in Fig. 6 the upstream part of the river basin is not filled with the landslide susceptibility distribution. It occurred because the necessary information for the operation of the SAGA/UFRJ model, such as soil geomorphology, was not available for the specific area. Table 3 resumes an analysis of SHALSTAB, IPT, SAGA, SMORPH and modified SMORPH models for their predictive efficacy of risk occurrence. The positive sign indicates the prediction of an occurrence. And the negative sign represents a diagnosis in which the model did not anticipate the occurrence.
From the columns of the table, "prevalence" means the hit ratio of the cases in which the respective model classified them as positive and negative. The "odds" is defined as the risk of landslide occurrence, which is obtained through the ratio between the probability of occurrence and non-occurrence.
The "odds ratio" value represents the odds ratio or relative risk, i.e., the possibility of an occurrence to take place when the model indicates the positive sign in detriment of when it indicates non-occurrence.  The "efficacy ratio" was calculated through the ratio between the non-occurrence and occurrence cases, considering both signs indicated by the models, positive and negative sign. It represents the ratio in which landslide is more likely to occur than not to occur, or vice versa. Proportionally, the model better predicts the risk when the diagnosis refers to an occurrence compared to a non-occurrence diagnosis.
Furthermore, in order to provide a more complete understanding of the efficacy of each model in predicting landslide occurrences, it was created a diagram that correlates sensitivity and specificity. The "sensitivity" corresponds to true positive, that is, the probability to predict the event. And the "specificity" corresponds to false positive.
The summary of the analysis of SHALSTAB, IPT, SAGA, SMORPH and modified SMORPH models is presented in Fig. 9 to 13 as a ROC curve.
The Area Under the ROC Curve (AUC) corresponds to the predictive power of the analysis and this value is always between 0 and 1. Thus, the larger the AUC, the higher the predictive power of the model related to the event occurrence.

Analysis of SHALSTAB Model
For the SHALSTAB model, it was verified that from a total of 11,591 cases, only 717 were confirmed by the model; whereas in 10,874 cases identified as positive, according to the model, the landslide did not occur. Given a negative diagnosis, we have 138 occurrences, that is, even when the model has a nonoccurrence as a result, it is still expected that 3.49% of landslides are to happen. The risk occurrence when the model indicates a positive diagnosis is 6.59%. Likewise, when the model points to a negative diagnosis, the risk of non-occurrence is 3.62%. Both results of occurrence and non-occurrence accuracy are low.
Analyzing the "odds ratio", it is 1.82 times higher than the chance of occurrence when the model points to nonoccurrence. The confidence interval is 1.51: 2.2, it is significant and does not include the 1.0 value.
From the Fig. 9 for SHALTAB model, the maximum prediction value found is 0.035, which corresponds to the maximum value of the ROC curve.
The area under the curve is 0.549, which is considered a low value (admitting 1.000 as maximum value). This value will be evaluated later in comparison with other models analyzed in this research.

Analysis of IPT Model
It was verified that only 823 of 13,296 cases were identified by the model, while in 13,103 cases, the landslide indicated as positive by the model did not occur. Of events given a negative diagnosis by the IPT model, there were 32 occurrences, that is, even when the model identifies a non-occurrence, 1.98% of landslides are still expected to occur.
The risk occurrence when the IPT model performs a positive diagnosis corresponds to 6.28%. Likewise, there is a risk of 2.02% in the case of a negative diagnosis. As in the SHALSTAB model, the IPT showed low results (odds) of occurrence and nonoccurrence efficacy.
The possibility of occurrence, when the model classifies it as positive, is 3.11 times higher than the chance for it to occur when the IPT points to a non-occurrence. The confidence interval is 2.18:4.60, which is significant and does not include the 1.0 value.
A summary of the analysis is shown in Fig. 10. The maximum value of prediction corresponds to 0.020.
The area under the ROC curve corresponds to the predictive power of the analysis, whose value is 0.535, which is considered low and will also be evaluated later in comparison with the other models analyzed in this research.

Analysis of SAGA Model
It was verified that only 524 from a total of 8,224 cases were confirmed by the model, while in 7,700 cases the landslide indicated as positive by SAGA did not occur. Given a negative diagnosis by the model, we have 331 occurrences, that is, even when the model results in a nonoccurrence, 4.52% of landslides are still expected to occur.
The risk occurrence corresponds to 6.81% when the model provides a positive diagnosis. Likewise, when the model provides a negative diagnosis, the risk is 4.74%.
In this case, when the model classifies an occurrence as positive, we have a result 14.7 times more likely not to occur than to occur. When the model classifies an occurrence as negative, we have a result 21.1 times more likely not to occur than to occur.  Fig. 11, in which the maximum prediction value found corresponds to 0.045 and to the maximum value of the ROC curve. The area under the ROC curve corresponds to 0.544, which is considered low.

Analysis of SMORPH Model
It was verified that only 723 from a total of 9,577 cases were confirmed as positive by the model, while the 8,854 landslides indicated as positive did not occur. Given a negative diagnosis by the SMORPH model, we have 132 cases of occurrence, that is, even when the model results in a non-occurrence, 2.21% of landslides are still expected to occur.
Thus, "prevalence" means the hit ratio of occurrences related to the cases classified by the SMORPH model as positive (7.55%) and negative (2.21%). The risk occurrence corresponds to 8.17% when the model provides a positive diagnosis. Likewise, when the model provides a negative diagnosis, the risk is 2.26%.
As in the SHALSTAB, IPT and SAGA models, the results of the accuracy of occurrence of the SMORPH model were low.
From the Fig. 12, the maximum prediction value found corresponds to 0.022 and the AUC corresponds to 0.621, which is considered low value. Yet, it is the highest value of all models presented.

Analysis of Modified SMORPH Model
It was verified that, from a total of 11,372 cases, only 686 were confirmed by the model; while 10,686 landslide cases classified as positive by the model did not take place. Given a negative diagnosis by the modified SMORPH model, we have 169 cases of occurrences, that is, even when the model results in a non-occurrence, 4.05% of them are still expected to occur.
The odds correspond to 6.42% when the model provides a positive diagnosis. Similarly, the risk occurrence is 4.22% when the model provides a negative diagnosis.
When the result of the model for an occurrence is positive, the result is 15.5 times more likely not to occur than to occur. When it points to the negative of a landslide occurrence, the result is 23.6 times more likely not to occur than to occur.
For the modified SMORPH model, the area under the curve (Fig. 13) is 0.537 and its maximum value corresponds to 0.041.
In order to streamline the comparison process between the results of odds ratios, Table 4 summarizes the analysis of the variation of risk occurrence, while Table 5 summarizes the analyzed predictions of the occurrences. According to the results, the models are presented in descending order in relation to their capacity to meet the requirement (that is, to point to the risk occurrence or to predict the occurrences).

101
The SMORPH model presents a higher predictive performance than the other models; however, this model could still be improved if it considered other variables, for instance, the use of soil. When the SMORPH model is compared to the others, it is observed that it fails less and succeeds more.   The following analyses were performed with pairs of models for the concordance evaluation by Kappa statistic, which has p-value as the descriptive significance level of the test. The standard significance levels frequently used have p-value lower than 5%.
From the occurrence and non-occurrence data, when the models predict if there will be an occurrence or not, we intend to identify the concordance between the models when both of them predict the occurrence and the non-occurrence.
Based on the results of concordance analysis by Kappa statistic (Z = −0.731 and p-value = 0.7676) of SHALSTAB and IPT models, it is possible to conclude that there is no significant evidence of association and the concordance may be considered random or, in other words, casual. The concordance results between SHALSTAB and IPT diagnoses for non-occurrences by Kappa statistic were Z = 0.395 and p-value = 0.3464, which indicates a concordance considered random (casual).
The concordance analysis of SHALSTAB and SAGA models for the occurrences may be considered random (casual) by Kappa statistic (Z = 0.9102, pvalue = 0.1814). However, for the respective models analyzed to non-occurrence, there is a significant concordance between the diagnoses with Z= 3.3617 and p-value = 0.0003873.
After that, SHALSTAB and SMORPH models were analyzed and the concordance may be considered random (casual) for the occurrence diagnoses (Z = 1.346, p-value = 0.08915) and also for non-occurrence (Z = −6.1508, p-value = 1).
The concordance between the diagnoses from SAGA and modified SMORPH may be considered random (casual) for the occurrences (Z = 0.7697, pvalue = 0.2207), which is considered significant to the non-occurrences (Z = −21.008, p-value<0,001).
The analysis between SMORPH and modified SMORPH models indicated a significant concordance between the models for the occurrences (Z = 4.6543, p-value<0,001) and also for the non-occurrences (Z = 36.6583, p-value<0,001).
The analyses presented here reflect the concordance between models in the diagnoses according to the type of occurrence. Therefore, it can be observed when two models indicate a positive prediction and what degree of concordance they had in the diagnoses generated. Table 6 summarizes the analysis results of concordance between the models, according to the type of occurrence.

Conclusion
Through the presented analysis, especially through ROC curves, this research concludes that the accuracy of models is relevant; however, it is still low due to a large number of false positives indicated by all models. In all landslide models, anthropogenic variables -such as soil use-are not considered in the analyses. When analyzing those variables "in loco" in these landslide cases, they were verified as one of the most important determining factors contributing to the occurrence of geotechnical instability processes on the slopes. It is important to highlight that civil engineers, who confirmed the information validity, countersigned all 855 landslides cases.
It may be observed that the predictive power of the studied models is only slightly different, ranging from 0.535 to 0.621, which demonstrates the need for new variables to be included in the structure of the models. For the area in question, the SMORPH model provided the most significant result, although barely. Substantial differentiation is not observed between models.
The results indicated various confirmed cases of landslides while the models did not predict the existing risk. Thus, the analyses of the results aiming at identifying the existence of false positives and true negatives indicated high indexes, reflecting conservative models that require revision of model variables.
The research allowed us to verify the inexistence of a geographical recurrence relationship of a given occurrence when analyzed in the context of a historical series.
The event reported in the past does not necessarily imply a new occurrence. The fact that the analyzed place is an area of lower purchasing power points to a lack of retaining and drainage structures in particular areas, which could minimize landslide circumstances.
That being said, this research confirmed that the variables considered in these models do not respond properly to the phenomenon, indicating the need for a more detailed analysis that can associate the anthropogenic dynamics, as well as the social pattern of the area. The physical variables alone are unable to predict the proper efficacy of risk indicators. The presented result support the adoption of the models studied but with the inclusion of anthropogenic variables, which would increase the efficacy of the models. For this purpose, in-depth studies are recommended on the influence of the relief amplitude, geomorphology, lithology and its structural aspects, as well as soil use.