Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
In few-shot scenarios, the lack of caption-labeled samples and prior knowledge leads to insufficient training and performance degradation of remote sensing image captioning (RC) models. We propose an iterative remote sensing image captioning method named IRIC to promote RC model performance iteratio...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2025-12-01
|
Series: | International Journal of Digital Earth |
Subjects: | |
Online Access: | https://www.tandfonline.com/doi/10.1080/17538947.2025.2526102 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In few-shot scenarios, the lack of caption-labeled samples and prior knowledge leads to insufficient training and performance degradation of remote sensing image captioning (RC) models. We propose an iterative remote sensing image captioning method named IRIC to promote RC model performance iteration and generate higher quality captions. The IRIC first constructs a remote sensing text-to-image model CRTI based on contrastive learning, which can generate remote sensing images with the same semantic content from text and achieve text-driven remote sensing image transformation; Subsequently, caption-labeled sample amplification with prior knowledge introduction is implemented, which incorporates prior knowledge into the text-driven remote sensing image transformation to achieve caption-labeled sample amplification; Finally, the amplified caption-labeled samples are added to the original train set, and the RC model is retrained to achieve iterative performance improvement. The experimental results show that the IRIC is highly effective in few-shot scenarios and can iteratively improve the CIDEr scores of the latest few-shot RC model by 8.5%. |
---|---|
ISSN: | 1753-8947 1753-8955 |