Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province

Cadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gr...

Full description

Saved in:
Bibliographic Details
Main Authors: Qing-Qian Peng, Xia Zhou, Hang Zhou, Ye Liao, Zi-Yu Han, Lu Hu, Peng Zeng, Jiao-Feng Gu, Rong Zhang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Agronomy
Subjects:
Online Access:https://www.mdpi.com/2073-4395/15/6/1478
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839655082149281792
author Qing-Qian Peng
Xia Zhou
Hang Zhou
Ye Liao
Zi-Yu Han
Lu Hu
Peng Zeng
Jiao-Feng Gu
Rong Zhang
author_facet Qing-Qian Peng
Xia Zhou
Hang Zhou
Ye Liao
Zi-Yu Han
Lu Hu
Peng Zeng
Jiao-Feng Gu
Rong Zhang
author_sort Qing-Qian Peng
collection DOAJ
description Cadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Residual Neural Networks (ResNet), using a multi-source soil–rice dataset comprising 57,200 samples from Hunan Province. The results showed that the RF model performed best on the test set (<i>R</i><sup>2</sup> = 0.62), with the dominant features being soil’s available Cd (contributing 9.74%) and precipitation during the rice-filling stage (joint contribution of 15.96%). However, the model’s predictive performance experienced a sharp decline on the independent 2023 validation set comprising 393 samples from Yizhang County and Lengshuitan District, with <i>R</i><sup>2</sup> values ranging from −0.12 to −0.31. This highlighted the fundamental limitations of static data-driven paradigms. Agronomic management measures, simplified by heterogeneous data and binary encoding, failed to effectively represent the actual intervention intensity. The study demonstrated that while machine learning models captured nonlinear relationships in laboratory environments, they struggled to adapt to the dynamic interactions and spatiotemporal heterogeneity of farmland systems. Future efforts should focus on developing hybrid models guided by mechanistic insights, integrating dynamic environmental processes and real-time data, and promoting localized “one model per region” strategies to enhance predictive robustness. This study provides methodological insights for the technological transformation of agricultural artificial intelligence, emphasizing that the deep integration of data-driven approaches and mechanistic understanding is crucial for overcoming the “last mile” challenge.
format Article
id doaj-art-98e749e2d44d4cf4a68e9a25745087fa
institution Matheson Library
issn 2073-4395
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Agronomy
spelling doaj-art-98e749e2d44d4cf4a68e9a25745087fa2025-06-25T13:20:57ZengMDPI AGAgronomy2073-43952025-06-01156147810.3390/agronomy15061478Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan ProvinceQing-Qian Peng0Xia Zhou1Hang Zhou2Ye Liao3Zi-Yu Han4Lu Hu5Peng Zeng6Jiao-Feng Gu7Rong Zhang8College of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaTechnical Center for Soil, Agricultural and Rural Ecological Environment, Ministry of Ecology and Environment, Beijing 100012, ChinaHunan Provincial Soil Pollution Remediation and Carbon Fixation Engineering Technology Research Center, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaTechnical Center for Soil, Agricultural and Rural Ecological Environment, Ministry of Ecology and Environment, Beijing 100012, ChinaCadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Residual Neural Networks (ResNet), using a multi-source soil–rice dataset comprising 57,200 samples from Hunan Province. The results showed that the RF model performed best on the test set (<i>R</i><sup>2</sup> = 0.62), with the dominant features being soil’s available Cd (contributing 9.74%) and precipitation during the rice-filling stage (joint contribution of 15.96%). However, the model’s predictive performance experienced a sharp decline on the independent 2023 validation set comprising 393 samples from Yizhang County and Lengshuitan District, with <i>R</i><sup>2</sup> values ranging from −0.12 to −0.31. This highlighted the fundamental limitations of static data-driven paradigms. Agronomic management measures, simplified by heterogeneous data and binary encoding, failed to effectively represent the actual intervention intensity. The study demonstrated that while machine learning models captured nonlinear relationships in laboratory environments, they struggled to adapt to the dynamic interactions and spatiotemporal heterogeneity of farmland systems. Future efforts should focus on developing hybrid models guided by mechanistic insights, integrating dynamic environmental processes and real-time data, and promoting localized “one model per region” strategies to enhance predictive robustness. This study provides methodological insights for the technological transformation of agricultural artificial intelligence, emphasizing that the deep integration of data-driven approaches and mechanistic understanding is crucial for overcoming the “last mile” challenge.https://www.mdpi.com/2073-4395/15/6/1478machine learningcadmium pollutionprediction modelspatiotemporal extrapolationagricultural environmental complexity
spellingShingle Qing-Qian Peng
Xia Zhou
Hang Zhou
Ye Liao
Zi-Yu Han
Lu Hu
Peng Zeng
Jiao-Feng Gu
Rong Zhang
Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province
Agronomy
machine learning
cadmium pollution
prediction model
spatiotemporal extrapolation
agricultural environmental complexity
title Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province
title_full Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province
title_fullStr Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province
title_full_unstemmed Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province
title_short Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province
title_sort bridging the gap limitations of machine learning in real world prediction of heavy metal accumulation in rice in hunan province
topic machine learning
cadmium pollution
prediction model
spatiotemporal extrapolation
agricultural environmental complexity
url https://www.mdpi.com/2073-4395/15/6/1478
work_keys_str_mv AT qingqianpeng bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT xiazhou bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT hangzhou bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT yeliao bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT ziyuhan bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT luhu bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT pengzeng bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT jiaofenggu bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince
AT rongzhang bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince