Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province
Cadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gr...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Agronomy |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4395/15/6/1478 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839655082149281792 |
---|---|
author | Qing-Qian Peng Xia Zhou Hang Zhou Ye Liao Zi-Yu Han Lu Hu Peng Zeng Jiao-Feng Gu Rong Zhang |
author_facet | Qing-Qian Peng Xia Zhou Hang Zhou Ye Liao Zi-Yu Han Lu Hu Peng Zeng Jiao-Feng Gu Rong Zhang |
author_sort | Qing-Qian Peng |
collection | DOAJ |
description | Cadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Residual Neural Networks (ResNet), using a multi-source soil–rice dataset comprising 57,200 samples from Hunan Province. The results showed that the RF model performed best on the test set (<i>R</i><sup>2</sup> = 0.62), with the dominant features being soil’s available Cd (contributing 9.74%) and precipitation during the rice-filling stage (joint contribution of 15.96%). However, the model’s predictive performance experienced a sharp decline on the independent 2023 validation set comprising 393 samples from Yizhang County and Lengshuitan District, with <i>R</i><sup>2</sup> values ranging from −0.12 to −0.31. This highlighted the fundamental limitations of static data-driven paradigms. Agronomic management measures, simplified by heterogeneous data and binary encoding, failed to effectively represent the actual intervention intensity. The study demonstrated that while machine learning models captured nonlinear relationships in laboratory environments, they struggled to adapt to the dynamic interactions and spatiotemporal heterogeneity of farmland systems. Future efforts should focus on developing hybrid models guided by mechanistic insights, integrating dynamic environmental processes and real-time data, and promoting localized “one model per region” strategies to enhance predictive robustness. This study provides methodological insights for the technological transformation of agricultural artificial intelligence, emphasizing that the deep integration of data-driven approaches and mechanistic understanding is crucial for overcoming the “last mile” challenge. |
format | Article |
id | doaj-art-98e749e2d44d4cf4a68e9a25745087fa |
institution | Matheson Library |
issn | 2073-4395 |
language | English |
publishDate | 2025-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Agronomy |
spelling | doaj-art-98e749e2d44d4cf4a68e9a25745087fa2025-06-25T13:20:57ZengMDPI AGAgronomy2073-43952025-06-01156147810.3390/agronomy15061478Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan ProvinceQing-Qian Peng0Xia Zhou1Hang Zhou2Ye Liao3Zi-Yu Han4Lu Hu5Peng Zeng6Jiao-Feng Gu7Rong Zhang8College of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaTechnical Center for Soil, Agricultural and Rural Ecological Environment, Ministry of Ecology and Environment, Beijing 100012, ChinaHunan Provincial Soil Pollution Remediation and Carbon Fixation Engineering Technology Research Center, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaCollege of Ecology and Environment Sciences, Central South University of Forestry and Technology, Changsha 410004, ChinaTechnical Center for Soil, Agricultural and Rural Ecological Environment, Ministry of Ecology and Environment, Beijing 100012, ChinaCadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Residual Neural Networks (ResNet), using a multi-source soil–rice dataset comprising 57,200 samples from Hunan Province. The results showed that the RF model performed best on the test set (<i>R</i><sup>2</sup> = 0.62), with the dominant features being soil’s available Cd (contributing 9.74%) and precipitation during the rice-filling stage (joint contribution of 15.96%). However, the model’s predictive performance experienced a sharp decline on the independent 2023 validation set comprising 393 samples from Yizhang County and Lengshuitan District, with <i>R</i><sup>2</sup> values ranging from −0.12 to −0.31. This highlighted the fundamental limitations of static data-driven paradigms. Agronomic management measures, simplified by heterogeneous data and binary encoding, failed to effectively represent the actual intervention intensity. The study demonstrated that while machine learning models captured nonlinear relationships in laboratory environments, they struggled to adapt to the dynamic interactions and spatiotemporal heterogeneity of farmland systems. Future efforts should focus on developing hybrid models guided by mechanistic insights, integrating dynamic environmental processes and real-time data, and promoting localized “one model per region” strategies to enhance predictive robustness. This study provides methodological insights for the technological transformation of agricultural artificial intelligence, emphasizing that the deep integration of data-driven approaches and mechanistic understanding is crucial for overcoming the “last mile” challenge.https://www.mdpi.com/2073-4395/15/6/1478machine learningcadmium pollutionprediction modelspatiotemporal extrapolationagricultural environmental complexity |
spellingShingle | Qing-Qian Peng Xia Zhou Hang Zhou Ye Liao Zi-Yu Han Lu Hu Peng Zeng Jiao-Feng Gu Rong Zhang Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province Agronomy machine learning cadmium pollution prediction model spatiotemporal extrapolation agricultural environmental complexity |
title | Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province |
title_full | Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province |
title_fullStr | Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province |
title_full_unstemmed | Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province |
title_short | Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province |
title_sort | bridging the gap limitations of machine learning in real world prediction of heavy metal accumulation in rice in hunan province |
topic | machine learning cadmium pollution prediction model spatiotemporal extrapolation agricultural environmental complexity |
url | https://www.mdpi.com/2073-4395/15/6/1478 |
work_keys_str_mv | AT qingqianpeng bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT xiazhou bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT hangzhou bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT yeliao bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT ziyuhan bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT luhu bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT pengzeng bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT jiaofenggu bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince AT rongzhang bridgingthegaplimitationsofmachinelearninginrealworldpredictionofheavymetalaccumulationinriceinhunanprovince |