TY - JOUR
T1 - Landslide Modeling in a Tropical Mountain Basin Using Machine Learning Algorithms and Shapley Additive Explanations
AU - Vega, Johnny
AU - Sepúlveda-Murillo, Fabio Humberto
AU - Parra, Melissa
N1 - Funding Information:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Sincere thanks to “High Level Training Program for Full-time Professors in their own Doctorates” of the Academic and Research Vice Rector's Offices of the University of Medellín and National Doctorate Program for Teachers of Higher Education Institutions of the Ministry of Science, Technology and Innovation. Data was provided by Research Program “Vulnerability, resilience and risk of communities and supplying basins affected by landslides and avalanches,” code 1118-852-71251, project “Functions for vulnerability assessment due to water shortages by landslides and avalanches: micro-basins of southwest Antioquia,” contract 80740-492-2020 held between Fiduprevisora and the Universidad de Medellín, with resources from the National Financing Fund for science, technology, and innovation, “Francisco José de Caldas.”
Publisher Copyright:
© The Author(s) 2023.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Landslides are a geological hazard commonly induced by rainfall, earthquakes, deforestation, or human activity causing loss of human life every year specially on highlands or mountain slopes with serious impacts that threaten communities and its infrastructure. The incidence and recurrence of landslides are conditioned by several aspects related to soil properties, geological structure, climatic conditions, soil cover, and water flow. Precisely, Colombia is one of the most affected by this type of natural hazard, as well as by floods, since they are the natural phenomena that bring with them the most severe risks for communities. In this work, we articulated the statistical approach of the landslide conditioning factors, Machine Learning Algorithms (MLA), and Geographic Information System (GIS), evaluating a flexible and agile methodology to estimate the landslide susceptibility defining areas prone to the landslide occurrence. The MLA were validated in a case study in the “La Liboriana” River basin, located in the Municipality of Salgar in the Colombian mountains Andes where Landslide Susceptibility Maps (LSMs) were obtained. The obtained MLA results hold immense potential in the field of regional landslide mapping, facilitating the development of effective strategies aimed at minimizing the devastating impacts on human lives, infrastructure, and the natural environment. By leveraging these findings, proactive measures can be devised to safeguard vulnerable areas, mitigate risks, and ensure the safety and well-being of communities. Seven supervised MLA were employed, two regression algorithms (Logistic) and five decision tree algorithms (Recursive Partitioning and Regression Trees [RPART], Conditional Inference Trees [CTREE], Random Forest [RF], Ranger, and Extreme Gradient Boosting Algorithm [XGBoost]). The LSMs were produced for each MLA. Considering different performance metrics, the RF model yields the best classification accuracy with an area under receiver operating characteristic (ROC) curve of 95% and 90% of accuracy, providing the most representative results. Finally, the contribution of each landslide conditioning factor on predictions with RF model is explained using the SHAP method.
AB - Landslides are a geological hazard commonly induced by rainfall, earthquakes, deforestation, or human activity causing loss of human life every year specially on highlands or mountain slopes with serious impacts that threaten communities and its infrastructure. The incidence and recurrence of landslides are conditioned by several aspects related to soil properties, geological structure, climatic conditions, soil cover, and water flow. Precisely, Colombia is one of the most affected by this type of natural hazard, as well as by floods, since they are the natural phenomena that bring with them the most severe risks for communities. In this work, we articulated the statistical approach of the landslide conditioning factors, Machine Learning Algorithms (MLA), and Geographic Information System (GIS), evaluating a flexible and agile methodology to estimate the landslide susceptibility defining areas prone to the landslide occurrence. The MLA were validated in a case study in the “La Liboriana” River basin, located in the Municipality of Salgar in the Colombian mountains Andes where Landslide Susceptibility Maps (LSMs) were obtained. The obtained MLA results hold immense potential in the field of regional landslide mapping, facilitating the development of effective strategies aimed at minimizing the devastating impacts on human lives, infrastructure, and the natural environment. By leveraging these findings, proactive measures can be devised to safeguard vulnerable areas, mitigate risks, and ensure the safety and well-being of communities. Seven supervised MLA were employed, two regression algorithms (Logistic) and five decision tree algorithms (Recursive Partitioning and Regression Trees [RPART], Conditional Inference Trees [CTREE], Random Forest [RF], Ranger, and Extreme Gradient Boosting Algorithm [XGBoost]). The LSMs were produced for each MLA. Considering different performance metrics, the RF model yields the best classification accuracy with an area under receiver operating characteristic (ROC) curve of 95% and 90% of accuracy, providing the most representative results. Finally, the contribution of each landslide conditioning factor on predictions with RF model is explained using the SHAP method.
KW - Colombian Andes
KW - landslides
KW - machine learning
KW - SHAP
KW - statistical methods
KW - susceptibility
UR - http://www.scopus.com/inward/record.url?scp=85169702391&partnerID=8YFLogxK
U2 - 10.1177/11786221231195824
DO - 10.1177/11786221231195824
M3 - Artículo
AN - SCOPUS:85169702391
SN - 1178-6221
VL - 16
JO - Air, Soil and Water Research
JF - Air, Soil and Water Research
ER -