Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval

Cross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly...

Full description

Saved in:
Bibliographic Details
Main Authors: Tianci Sun, Chengyu Zheng, Xiu Li, Yanli Gao, Jie Nie, Lei Huang, Zhiqiang Wei
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10855571/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839611144380088320
author Tianci Sun
Chengyu Zheng
Xiu Li
Yanli Gao
Jie Nie
Lei Huang
Zhiqiang Wei
author_facet Tianci Sun
Chengyu Zheng
Xiu Li
Yanli Gao
Jie Nie
Lei Huang
Zhiqiang Wei
author_sort Tianci Sun
collection DOAJ
description Cross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly become a hotspot for research. Existing prompts typically emphasize either global or local information, which fails to excavate or fully leverage the effective information of cross-modal data, resulting in the subpar performance of retrieval models. To address these limitations, we propose a novel method called Strong and Weak Prompt Engineering (SWPE) for remote sensing image–text retrieval. Specifically, SWPE employs the Strong and Weak Prompt Generation module to generate fine-grained and global category semantic prompts via an attention mechanism and a pretrained classification model. The prompt-guided feature fine-tuning module then refines the prompt information using a Transformer architecture, integrating the refined prompts with high-level image, and text features to enhance both fine-grained details and global semantics. Finally, the adaptive hard sample elimination module optimizes the triplet loss function by training the model with negative sample pairs of varying difficulty, assigning higher weights to simpler pairs. Extensive quantitative and qualitative experiments on four remote sensing benchmarks validate the superior effectiveness of SWPE.
format Article
id doaj-art-f2723e37d12c4e078adece0ce4aa6a7c
institution Matheson Library
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-f2723e37d12c4e078adece0ce4aa6a7c2025-07-28T23:00:09ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01186968698010.1109/JSTARS.2025.353447410855571Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal RetrievalTianci Sun0https://orcid.org/0009-0000-4452-8139Chengyu Zheng1https://orcid.org/0000-0002-5948-0032Xiu Li2https://orcid.org/0009-0006-8822-5280Yanli Gao3Jie Nie4https://orcid.org/0000-0003-4952-7666Lei Huang5https://orcid.org/0000-0003-4087-3677Zhiqiang Wei6https://orcid.org/0000-0002-2830-8301Faculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaCross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly become a hotspot for research. Existing prompts typically emphasize either global or local information, which fails to excavate or fully leverage the effective information of cross-modal data, resulting in the subpar performance of retrieval models. To address these limitations, we propose a novel method called Strong and Weak Prompt Engineering (SWPE) for remote sensing image–text retrieval. Specifically, SWPE employs the Strong and Weak Prompt Generation module to generate fine-grained and global category semantic prompts via an attention mechanism and a pretrained classification model. The prompt-guided feature fine-tuning module then refines the prompt information using a Transformer architecture, integrating the refined prompts with high-level image, and text features to enhance both fine-grained details and global semantics. Finally, the adaptive hard sample elimination module optimizes the triplet loss function by training the model with negative sample pairs of varying difficulty, assigning higher weights to simpler pairs. Extensive quantitative and qualitative experiments on four remote sensing benchmarks validate the superior effectiveness of SWPE.https://ieeexplore.ieee.org/document/10855571/Image-text cross-modal retrievalremote sensingprompt engineering
spellingShingle Tianci Sun
Chengyu Zheng
Xiu Li
Yanli Gao
Jie Nie
Lei Huang
Zhiqiang Wei
Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Image-text cross-modal retrieval
remote sensing
prompt engineering
title Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_full Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_fullStr Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_full_unstemmed Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_short Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_sort strong and weak prompt engineering for remote sensing image text cross modal retrieval
topic Image-text cross-modal retrieval
remote sensing
prompt engineering
url https://ieeexplore.ieee.org/document/10855571/
work_keys_str_mv AT tiancisun strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval
AT chengyuzheng strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval
AT xiuli strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval
AT yanligao strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval
AT jienie strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval
AT leihuang strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval
AT zhiqiangwei strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval