Potential of random forest machine learning algorithm for geological mapping using PALSAR and Sentinel-2A remote sensing data: A case study of Tsagaan-uul area, southern Mongolia
Geological mapping in remote and geologically complex regions can be substantially improved by integrating remote sensing data with machine learning algorithms. This study evaluates the effectiveness of the Random Forest algorithm for geological mapping in the Tsagaan-uul area of the Khatanbulag anc...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-12-01
|
Series: | Journal of Asian Earth Sciences: X |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2590056025000155 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Geological mapping in remote and geologically complex regions can be substantially improved by integrating remote sensing data with machine learning algorithms. This study evaluates the effectiveness of the Random Forest algorithm for geological mapping in the Tsagaan-uul area of the Khatanbulag ancient massif, Mongolia, a region characterized by limited accessibility and sparse field data. A comprehensive set of predictor variables was used, including Sentinel-2A spectral bands and indices, ALOS PALSAR digital elevation model, and terrain morphometric features. Two distinct training strategies were employed: (1) based on a geological map, (2) based on field-collected rock samples from two lithologically diverse formations. Variable importance was assessed using the Mean Decrease Gini index, while classification performance was measured through overall accuracy, precision, recall, F1-score, and the Kappa coefficient. In the first experiment, ALOS PALSAR DEM and Terrain Ruggedness Index were identified as the most influential predictors. Overall accuracy across all nine models ranged from 59.9 % to 64.4 %, with Kappa coefficients between 0.508 and 0.562. Model 1, which used a 90–10 % split, achieved the highest performance, while Model 4 recorded the lowest. These suggest that the data split ratio had a greater impact on model accuracy than the number of decision trees. In the second experiment, variations in the number of trees and variables per split had minimal effects, whereas the choice of stratification method significantly affected model outcomes. Overall, findings emphasize the critical role of dataset configuration, such as class balance and representative sampling, in optimizing Random Forest-based geological mapping. |
---|---|
ISSN: | 2590-0560 |