Potential of random forest machine learning algorithm for geological mapping using PALSAR and Sentinel-2A remote sensing data: A case study of Tsagaan-uul area, southern Mongolia

Geological mapping in remote and geologically complex regions can be substantially improved by integrating remote sensing data with machine learning algorithms. This study evaluates the effectiveness of the Random Forest algorithm for geological mapping in the Tsagaan-uul area of the Khatanbulag anc...

Full description

Saved in:
Bibliographic Details
Main Authors: Munkhsuren Badrakh, Narantsetseg Tserendash, Erdenejargal Choindonjamts, Gáspár Albert
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Journal of Asian Earth Sciences: X
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590056025000155
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Geological mapping in remote and geologically complex regions can be substantially improved by integrating remote sensing data with machine learning algorithms. This study evaluates the effectiveness of the Random Forest algorithm for geological mapping in the Tsagaan-uul area of the Khatanbulag ancient massif, Mongolia, a region characterized by limited accessibility and sparse field data. A comprehensive set of predictor variables was used, including Sentinel-2A spectral bands and indices, ALOS PALSAR digital elevation model, and terrain morphometric features. Two distinct training strategies were employed: (1) based on a geological map, (2) based on field-collected rock samples from two lithologically diverse formations. Variable importance was assessed using the Mean Decrease Gini index, while classification performance was measured through overall accuracy, precision, recall, F1-score, and the Kappa coefficient. In the first experiment, ALOS PALSAR DEM and Terrain Ruggedness Index were identified as the most influential predictors. Overall accuracy across all nine models ranged from 59.9 % to 64.4 %, with Kappa coefficients between 0.508 and 0.562. Model 1, which used a 90–10 % split, achieved the highest performance, while Model 4 recorded the lowest. These suggest that the data split ratio had a greater impact on model accuracy than the number of decision trees. In the second experiment, variations in the number of trees and variables per split had minimal effects, whereas the choice of stratification method significantly affected model outcomes. Overall, findings emphasize the critical role of dataset configuration, such as class balance and representative sampling, in optimizing Random Forest-based geological mapping.
ISSN:2590-0560