HierLabelNet: A Two-Stage LLMs Framework with Data Augmentation and Label Selection for Geographic Text Classification
Earth observation data serve as a fundamental resource in Earth system science. The rapid advancement of remote sensing and in situ measurement technologies has led to the generation of massive volumes of data, accompanied by a growing body of geographic textual information. Efficient and accurate c...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-07-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2220-9964/14/7/268 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Earth observation data serve as a fundamental resource in Earth system science. The rapid advancement of remote sensing and in situ measurement technologies has led to the generation of massive volumes of data, accompanied by a growing body of geographic textual information. Efficient and accurate classification and management of these geographic texts has become a critical challenge in the field. However, the effectiveness of traditional classification approaches is hindered by several issues, including data sparsity, class imbalance, semantic ambiguity, and the prevalence of domain-specific terminology. To address these limitations and enable the intelligent management of geographic information, this study proposes an efficient geographic text classification framework based on large language models (LLMs), tailored to the unique semantic and structural characteristics of geographic data. Specifically, LLM-based data augmentation strategies are employed to mitigate the scarcity of labeled data and class imbalance. A semantic vector database is utilized to filter the label space prior to inference, enhancing the model’s adaptability to diverse geographic terms. Furthermore, few-shot prompt learning guides LLMs in understanding domain-specific language, while an output alignment mechanism improves classification stability for complex descriptions. This approach offers a scalable solution for the automated semantic classification of geographic text for unlocking the potential of ever-expanding geospatial big data, thereby advancing intelligent information processing and knowledge discovery in the geospatial domain. |
---|---|
ISSN: | 2220-9964 |