Worldwide estimation of monthly global and diffuse horizontal irradiation via machine learning

Accurate prediction of solar irradiation is critical for photovoltaic system design, energy forecasting, and planning. This study evaluates the performance of seven machine learning models in predicting Global Horizontal Irradiation (GHI) and Diffuse Horizontal Irradiation (DHI) using a monthly-reso...

Full description

Saved in:
Bibliographic Details
Main Authors: Bilal Rinchi, Aisha Al-Iter, Anas Qawasmeh, Mustafa Alharbi, Sameer Al-Dahidi, Osama Ayadi, Mohammad Alrbai
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:Energy Conversion and Management: X
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590174525003010
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate prediction of solar irradiation is critical for photovoltaic system design, energy forecasting, and planning. This study evaluates the performance of seven machine learning models in predicting Global Horizontal Irradiation (GHI) and Diffuse Horizontal Irradiation (DHI) using a monthly-resolution dataset sourced from the Photovoltaic Geographical Information System (PVGIS), covering 721 globally distributed locations. Five input features used per task from a six-feature dataset. The Extreme Gradient Boosting (XGB) model achieved the highest overall performance, with test set coefficient of determination (R2) values of 96.88% for GHI and 97.51% for DHI. Seasonal analysis showed the highest accuracy between April and June, with increased errors during the first and last quarters of the year. A full feature combination analysis evaluated all 31 possible input subsets for each prediction task. Results confirmed that including all features produced the best performance but also revealed that the most influential inputs depend on the prediction target. For DHI prediction, GHI was more important than temperature, while for GHI, excluding either had minimal impact. Latitude and the month number consistently appeared in top-performing combinations, highlighting the importance of spatial and seasonal inputs. Satellite-based validation across three cities showed that model accuracy was highly location dependent and demonstrated the value of evaluating multiple performance metrics. In ground-based validation using in-situ measurements from Amman, Jordan, model rankings shifted, with the Random Forest model achieving the highest accuracy (95.09% R2) despite limited inputs. These findings support global machine learning models while emphasizing the need for regional assessment.
ISSN:2590-1745