Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach

Effectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali Suliman AlSalehy, Mike Bailey
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Smart Cities
Subjects:
Online Access:https://www.mdpi.com/2624-6511/8/3/90
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Effectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022) of data from ten monitoring stations, combined with meteorological variables. Exploratory analysis revealed distinct diurnal and moderate weekly CO cycles, with prevailing northwesterly winds shaping dispersion. Spatial correlation of CO was low (average 0.14), suggesting strong local sources, unlike temperature (0.92) and wind (0.5–0.6), which showed higher spatial coherence. Seasonal Trend decomposition (STL) confirmed stronger seasonality in meteorological factors than in CO levels. Low wind speeds were associated with elevated CO concentrations. Key predictive features, such as 3-h rolling mean and median values of CO, dominated feature importance. Spatiotemporal analysis highlighted persistent hotspots in industrial areas and unexpectedly high levels in some residential zones. A range of models was tested, with ensemble methods (Extreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost)) achieving the best performance (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and XGBoost producing the lowest Root Mean Squared Error (RMSE) of 0.0371 ppm. This work enhances understanding of CO dynamics in complex urban–industrial areas, providing accurate predictive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and highlighting the importance of local sources and temporal patterns for improving air quality forecasts.
ISSN:2624-6511