Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval

Cross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tianci Sun, Chengyu Zheng, Xiu Li, Yanli Gao, Jie Nie, Lei Huang, Zhiqiang Wei
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Image-text cross-modal retrieval remote sensing prompt engineering
Online Access:	https://ieeexplore.ieee.org/document/10855571/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839611144380088320
author	Tianci Sun Chengyu Zheng Xiu Li Yanli Gao Jie Nie Lei Huang Zhiqiang Wei
author_facet	Tianci Sun Chengyu Zheng Xiu Li Yanli Gao Jie Nie Lei Huang Zhiqiang Wei
author_sort	Tianci Sun
collection	DOAJ
description	Cross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly become a hotspot for research. Existing prompts typically emphasize either global or local information, which fails to excavate or fully leverage the effective information of cross-modal data, resulting in the subpar performance of retrieval models. To address these limitations, we propose a novel method called Strong and Weak Prompt Engineering (SWPE) for remote sensing image–text retrieval. Specifically, SWPE employs the Strong and Weak Prompt Generation module to generate fine-grained and global category semantic prompts via an attention mechanism and a pretrained classification model. The prompt-guided feature fine-tuning module then refines the prompt information using a Transformer architecture, integrating the refined prompts with high-level image, and text features to enhance both fine-grained details and global semantics. Finally, the adaptive hard sample elimination module optimizes the triplet loss function by training the model with negative sample pairs of varying difficulty, assigning higher weights to simpler pairs. Extensive quantitative and qualitative experiments on four remote sensing benchmarks validate the superior effectiveness of SWPE.
format	Article
id	doaj-art-f2723e37d12c4e078adece0ce4aa6a7c
institution	Matheson Library
issn	1939-1404 2151-1535
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj-art-f2723e37d12c4e078adece0ce4aa6a7c2025-07-28T23:00:09ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01186968698010.1109/JSTARS.2025.353447410855571Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal RetrievalTianci Sun0https://orcid.org/0009-0000-4452-8139Chengyu Zheng1https://orcid.org/0000-0002-5948-0032Xiu Li2https://orcid.org/0009-0006-8822-5280Yanli Gao3Jie Nie4https://orcid.org/0000-0003-4952-7666Lei Huang5https://orcid.org/0000-0003-4087-3677Zhiqiang Wei6https://orcid.org/0000-0002-2830-8301Faculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaCross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly become a hotspot for research. Existing prompts typically emphasize either global or local information, which fails to excavate or fully leverage the effective information of cross-modal data, resulting in the subpar performance of retrieval models. To address these limitations, we propose a novel method called Strong and Weak Prompt Engineering (SWPE) for remote sensing image–text retrieval. Specifically, SWPE employs the Strong and Weak Prompt Generation module to generate fine-grained and global category semantic prompts via an attention mechanism and a pretrained classification model. The prompt-guided feature fine-tuning module then refines the prompt information using a Transformer architecture, integrating the refined prompts with high-level image, and text features to enhance both fine-grained details and global semantics. Finally, the adaptive hard sample elimination module optimizes the triplet loss function by training the model with negative sample pairs of varying difficulty, assigning higher weights to simpler pairs. Extensive quantitative and qualitative experiments on four remote sensing benchmarks validate the superior effectiveness of SWPE.https://ieeexplore.ieee.org/document/10855571/Image-text cross-modal retrievalremote sensingprompt engineering
spellingShingle	Tianci Sun Chengyu Zheng Xiu Li Yanli Gao Jie Nie Lei Huang Zhiqiang Wei Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Image-text cross-modal retrieval remote sensing prompt engineering
title	Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_full	Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_fullStr	Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_full_unstemmed	Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_short	Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
title_sort	strong and weak prompt engineering for remote sensing image text cross modal retrieval
topic	Image-text cross-modal retrieval remote sensing prompt engineering
url	https://ieeexplore.ieee.org/document/10855571/
work_keys_str_mv	AT tiancisun strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT chengyuzheng strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT xiuli strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT yanligao strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT jienie strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT leihuang strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT zhiqiangwei strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval

Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval

Similar Items