Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model

Expansive clays damage the foundations, slabs, and utilities of low- and mid-rise buildings, threatening daily operations and incurring billions of dollars in costs globally. This study pioneers a domain-informed machine learning framework, coupled with a collinearity-aware feature selection strateg...

Full description

Saved in:
Bibliographic Details
Main Author: Hani S. Alharbi
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Buildings
Subjects:
Online Access:https://www.mdpi.com/2075-5309/15/14/2530
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839616309812264960
author Hani S. Alharbi
author_facet Hani S. Alharbi
author_sort Hani S. Alharbi
collection DOAJ
description Expansive clays damage the foundations, slabs, and utilities of low- and mid-rise buildings, threatening daily operations and incurring billions of dollars in costs globally. This study pioneers a domain-informed machine learning framework, coupled with a collinearity-aware feature selection strategy, to predict soil swell potential solely from routine index properties. Following hard-limit filtering and Unified Soil Classification System (USCS) screening, 291 valid samples were extracted from a public dataset of 395 cases. A random forest benchmark model was developed using five correlated features, and a multicollinearity analysis, as indicated by the variance inflation factor, revealed exact linear dependence among the Atterberg limits. A parsimonious two-variable model, based solely on plasticity index (PI) and clay fraction (C), was retained. On an 80:20 stratified hold-out set, this simplified model reduced root mean square error (RMSE) from 9.0% to 6.8% and maximum residuals from 42% to 16%. Bootstrap analysis confirmed a median RMSE of 7.5% with stable 95% prediction intervals. Shapley Additive Explanations (SHAP) analysis revealed that PI accounted for approximately 75% of the model’s influence, highlighting the critical swell surge beyond PI ≈ 55%. This work introduces a rule-based cleaning pipeline and collinearity-aware feature selection to derive a robust, two-variable model balancing accuracy and interpretability, a lightweight, interpretable tool for foundation design, GIS zoning, and BIM workflows.
format Article
id doaj-art-9ca93e4d9f094940a3e8a8cffe4f0822
institution Matheson Library
issn 2075-5309
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Buildings
spelling doaj-art-9ca93e4d9f094940a3e8a8cffe4f08222025-07-25T13:17:36ZengMDPI AGBuildings2075-53092025-07-011514253010.3390/buildings15142530Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning ModelHani S. Alharbi0Civil Engineering Department, College of Engineering, Shaqra University, Dawadmi 11911, Riyadh, Saudi ArabiaExpansive clays damage the foundations, slabs, and utilities of low- and mid-rise buildings, threatening daily operations and incurring billions of dollars in costs globally. This study pioneers a domain-informed machine learning framework, coupled with a collinearity-aware feature selection strategy, to predict soil swell potential solely from routine index properties. Following hard-limit filtering and Unified Soil Classification System (USCS) screening, 291 valid samples were extracted from a public dataset of 395 cases. A random forest benchmark model was developed using five correlated features, and a multicollinearity analysis, as indicated by the variance inflation factor, revealed exact linear dependence among the Atterberg limits. A parsimonious two-variable model, based solely on plasticity index (PI) and clay fraction (C), was retained. On an 80:20 stratified hold-out set, this simplified model reduced root mean square error (RMSE) from 9.0% to 6.8% and maximum residuals from 42% to 16%. Bootstrap analysis confirmed a median RMSE of 7.5% with stable 95% prediction intervals. Shapley Additive Explanations (SHAP) analysis revealed that PI accounted for approximately 75% of the model’s influence, highlighting the critical swell surge beyond PI ≈ 55%. This work introduces a rule-based cleaning pipeline and collinearity-aware feature selection to derive a robust, two-variable model balancing accuracy and interpretability, a lightweight, interpretable tool for foundation design, GIS zoning, and BIM workflows.https://www.mdpi.com/2075-5309/15/14/2530clay contentexpansive soilsmachine learningplasticity indexrandom forestSHAP interpretability
spellingShingle Hani S. Alharbi
Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model
Buildings
clay content
expansive soils
machine learning
plasticity index
random forest
SHAP interpretability
title Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model
title_full Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model
title_fullStr Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model
title_full_unstemmed Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model
title_short Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model
title_sort efficient swell risk prediction for building design using a domain guided machine learning model
topic clay content
expansive soils
machine learning
plasticity index
random forest
SHAP interpretability
url https://www.mdpi.com/2075-5309/15/14/2530
work_keys_str_mv AT hanisalharbi efficientswellriskpredictionforbuildingdesignusingadomainguidedmachinelearningmodel