Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model
Expansive clays damage the foundations, slabs, and utilities of low- and mid-rise buildings, threatening daily operations and incurring billions of dollars in costs globally. This study pioneers a domain-informed machine learning framework, coupled with a collinearity-aware feature selection strateg...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-07-01
|
Series: | Buildings |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-5309/15/14/2530 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839616309812264960 |
---|---|
author | Hani S. Alharbi |
author_facet | Hani S. Alharbi |
author_sort | Hani S. Alharbi |
collection | DOAJ |
description | Expansive clays damage the foundations, slabs, and utilities of low- and mid-rise buildings, threatening daily operations and incurring billions of dollars in costs globally. This study pioneers a domain-informed machine learning framework, coupled with a collinearity-aware feature selection strategy, to predict soil swell potential solely from routine index properties. Following hard-limit filtering and Unified Soil Classification System (USCS) screening, 291 valid samples were extracted from a public dataset of 395 cases. A random forest benchmark model was developed using five correlated features, and a multicollinearity analysis, as indicated by the variance inflation factor, revealed exact linear dependence among the Atterberg limits. A parsimonious two-variable model, based solely on plasticity index (PI) and clay fraction (C), was retained. On an 80:20 stratified hold-out set, this simplified model reduced root mean square error (RMSE) from 9.0% to 6.8% and maximum residuals from 42% to 16%. Bootstrap analysis confirmed a median RMSE of 7.5% with stable 95% prediction intervals. Shapley Additive Explanations (SHAP) analysis revealed that PI accounted for approximately 75% of the model’s influence, highlighting the critical swell surge beyond PI ≈ 55%. This work introduces a rule-based cleaning pipeline and collinearity-aware feature selection to derive a robust, two-variable model balancing accuracy and interpretability, a lightweight, interpretable tool for foundation design, GIS zoning, and BIM workflows. |
format | Article |
id | doaj-art-9ca93e4d9f094940a3e8a8cffe4f0822 |
institution | Matheson Library |
issn | 2075-5309 |
language | English |
publishDate | 2025-07-01 |
publisher | MDPI AG |
record_format | Article |
series | Buildings |
spelling | doaj-art-9ca93e4d9f094940a3e8a8cffe4f08222025-07-25T13:17:36ZengMDPI AGBuildings2075-53092025-07-011514253010.3390/buildings15142530Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning ModelHani S. Alharbi0Civil Engineering Department, College of Engineering, Shaqra University, Dawadmi 11911, Riyadh, Saudi ArabiaExpansive clays damage the foundations, slabs, and utilities of low- and mid-rise buildings, threatening daily operations and incurring billions of dollars in costs globally. This study pioneers a domain-informed machine learning framework, coupled with a collinearity-aware feature selection strategy, to predict soil swell potential solely from routine index properties. Following hard-limit filtering and Unified Soil Classification System (USCS) screening, 291 valid samples were extracted from a public dataset of 395 cases. A random forest benchmark model was developed using five correlated features, and a multicollinearity analysis, as indicated by the variance inflation factor, revealed exact linear dependence among the Atterberg limits. A parsimonious two-variable model, based solely on plasticity index (PI) and clay fraction (C), was retained. On an 80:20 stratified hold-out set, this simplified model reduced root mean square error (RMSE) from 9.0% to 6.8% and maximum residuals from 42% to 16%. Bootstrap analysis confirmed a median RMSE of 7.5% with stable 95% prediction intervals. Shapley Additive Explanations (SHAP) analysis revealed that PI accounted for approximately 75% of the model’s influence, highlighting the critical swell surge beyond PI ≈ 55%. This work introduces a rule-based cleaning pipeline and collinearity-aware feature selection to derive a robust, two-variable model balancing accuracy and interpretability, a lightweight, interpretable tool for foundation design, GIS zoning, and BIM workflows.https://www.mdpi.com/2075-5309/15/14/2530clay contentexpansive soilsmachine learningplasticity indexrandom forestSHAP interpretability |
spellingShingle | Hani S. Alharbi Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model Buildings clay content expansive soils machine learning plasticity index random forest SHAP interpretability |
title | Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model |
title_full | Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model |
title_fullStr | Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model |
title_full_unstemmed | Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model |
title_short | Efficient Swell Risk Prediction for Building Design Using a Domain-Guided Machine Learning Model |
title_sort | efficient swell risk prediction for building design using a domain guided machine learning model |
topic | clay content expansive soils machine learning plasticity index random forest SHAP interpretability |
url | https://www.mdpi.com/2075-5309/15/14/2530 |
work_keys_str_mv | AT hanisalharbi efficientswellriskpredictionforbuildingdesignusingadomainguidedmachinelearningmodel |