Predictive Models for Environmental Perception in Multi-Type Parks and Their Generalization Ability: Integrating Pre-Training and Reinforcement Learning
Evaluating the environmental perception of urban parks is highly significant for optimizing urban planning. To address the limitations of traditional evaluation methods, a multimodal deep learning framework that integrates pre-training and reinforcement learning strategies for the comprehensive asse...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-07-01
|
Series: | Buildings |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-5309/15/13/2364 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Evaluating the environmental perception of urban parks is highly significant for optimizing urban planning. To address the limitations of traditional evaluation methods, a multimodal deep learning framework that integrates pre-training and reinforcement learning strategies for the comprehensive assessment of various park types (seaside, urban, mountain, and wetland) across three dimensions—accessibility, usability, and aesthetics—is proposed herein. By combining image data and user review texts, a unified architecture is constructed, including a text encoder, image visual encoder, and multimodal fusion module. During the pre-training phase, the model captured latent features in images and texts through a self-supervised learning strategy. In the subsequent training phase, a reinforcement learning strategy was introduced to optimize the sample selection and modal fusion paths to enhance the model’s generalization capability. To validate the cross-type prediction ability of the model, the experimental design uses data from three types of parks for training, with the remaining type as a test set. Results demonstrate that the proposed method outperforms LSTM and CNN architectures across accuracy, precision, recall, and F1 Score metrics. Compared with CNN, the proposed method improves accuracy by 5.1% and F1 Score by 6.6%. Further analysis shows that pre-training enhances the robust fusion of visual and textual features, while reinforcement learning optimizes the sample selection and feature fusion strategies during training. |
---|---|
ISSN: | 2075-5309 |