Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province

Cadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gr...

Full description

Saved in:
Bibliographic Details
Main Authors: Qing-Qian Peng, Xia Zhou, Hang Zhou, Ye Liao, Zi-Yu Han, Lu Hu, Peng Zeng, Jiao-Feng Gu, Rong Zhang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Agronomy
Subjects:
Online Access:https://www.mdpi.com/2073-4395/15/6/1478
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cadmium (Cd) pollution poses a severe threat to rice safety and human health, while traditional linear models exhibit significant limitations in predicting rice Cd accumulation due to environmental complexities. This study systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Residual Neural Networks (ResNet), using a multi-source soil–rice dataset comprising 57,200 samples from Hunan Province. The results showed that the RF model performed best on the test set (<i>R</i><sup>2</sup> = 0.62), with the dominant features being soil’s available Cd (contributing 9.74%) and precipitation during the rice-filling stage (joint contribution of 15.96%). However, the model’s predictive performance experienced a sharp decline on the independent 2023 validation set comprising 393 samples from Yizhang County and Lengshuitan District, with <i>R</i><sup>2</sup> values ranging from −0.12 to −0.31. This highlighted the fundamental limitations of static data-driven paradigms. Agronomic management measures, simplified by heterogeneous data and binary encoding, failed to effectively represent the actual intervention intensity. The study demonstrated that while machine learning models captured nonlinear relationships in laboratory environments, they struggled to adapt to the dynamic interactions and spatiotemporal heterogeneity of farmland systems. Future efforts should focus on developing hybrid models guided by mechanistic insights, integrating dynamic environmental processes and real-time data, and promoting localized “one model per region” strategies to enhance predictive robustness. This study provides methodological insights for the technological transformation of agricultural artificial intelligence, emphasizing that the deep integration of data-driven approaches and mechanistic understanding is crucial for overcoming the “last mile” challenge.
ISSN:2073-4395