Predictive Models for Environmental Perception in Multi-Type Parks and Their Generalization Ability: Integrating Pre-Training and Reinforcement Learning

Evaluating the environmental perception of urban parks is highly significant for optimizing urban planning. To address the limitations of traditional evaluation methods, a multimodal deep learning framework that integrates pre-training and reinforcement learning strategies for the comprehensive asse...

Full description

Saved in:
Bibliographic Details
Main Authors: Kangen Chen, Tao Xia, Zhoutong Cao, Yiwen Li, Xiuhong Lin, Rushan Bai
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Buildings
Subjects:
Online Access:https://www.mdpi.com/2075-5309/15/13/2364
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Evaluating the environmental perception of urban parks is highly significant for optimizing urban planning. To address the limitations of traditional evaluation methods, a multimodal deep learning framework that integrates pre-training and reinforcement learning strategies for the comprehensive assessment of various park types (seaside, urban, mountain, and wetland) across three dimensions—accessibility, usability, and aesthetics—is proposed herein. By combining image data and user review texts, a unified architecture is constructed, including a text encoder, image visual encoder, and multimodal fusion module. During the pre-training phase, the model captured latent features in images and texts through a self-supervised learning strategy. In the subsequent training phase, a reinforcement learning strategy was introduced to optimize the sample selection and modal fusion paths to enhance the model’s generalization capability. To validate the cross-type prediction ability of the model, the experimental design uses data from three types of parks for training, with the remaining type as a test set. Results demonstrate that the proposed method outperforms LSTM and CNN architectures across accuracy, precision, recall, and F1 Score metrics. Compared with CNN, the proposed method improves accuracy by 5.1% and F1 Score by 6.6%. Further analysis shows that pre-training enhances the robust fusion of visual and textual features, while reinforcement learning optimizes the sample selection and feature fusion strategies during training.
ISSN:2075-5309