A novel framework embedding Bayesian-optimized ensemble machine learning and explainable artificial intelligence (XAI) to improve flood prediction in complex watersheds

Floods are among the most common and destructive natural hazards, particularly in areas characterized by data scarcity and intricate geomorphological features. In such regions, accurate and interpretable flood susceptibility mapping is essential for effective risk reduction and informed urban planni...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Gufran Alam, Vaibhav Tripathi, C.M. Bhatt, Mohit Prakash Mohanty
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Environmental and Sustainability Indicators
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2665972725001813
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Floods are among the most common and destructive natural hazards, particularly in areas characterized by data scarcity and intricate geomorphological features. In such regions, accurate and interpretable flood susceptibility mapping is essential for effective risk reduction and informed urban planning. This study proposes a novel ensemble machine learning (ML) framework integrated with geomorphology-based flood conditioning factors (FCFs) to identify flood-prone areas over Kosi Megafan in India-a large watershed with significant anabranching characteristics. A comprehensive set of 21 FCFs, including topographic, hydrologic, and stream-related indicators-was derived from the FABDEM, satellite imagery, and ancillary datasets. After multicollinearity analysis and information gain ratio filtering, 15 key FCFs were selected for model training. Four base ML models, namely, Random Forest, XGBoost, CatBoost, and Long Short-Term Memory, were optimized using Bayesian techniques and combined into a stacked ensemble classifier. The model was trained using historical flood extents from the Global Surface Water dataset and validated against Sentinel-1 SAR imagery. Results indicate that the ensemble model outperformed individual classifiers, achieving the highest accuracy (90 %) and Cohen's Kappa (0.79). SHapley Additive exPlanations (SHAP) were used to enhance interpretability, highlighting elevation, rainfall, curve number, and drainage density as the most influential predictors. The final flood susceptibility map shows that 35.18 % of the megafan is very highly susceptible, aligning well with observed flood events. This interpretable and scalable framework holds strong potential for enhancing flood risk management in complex, data-scarce catchments, while also supporting global initiatives such as the Sendai Framework and Sustainable Development Goals (SDGs) 11 and 13.
ISSN:2665-9727