Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
Cross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10855571/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839611144380088320 |
---|---|
author | Tianci Sun Chengyu Zheng Xiu Li Yanli Gao Jie Nie Lei Huang Zhiqiang Wei |
author_facet | Tianci Sun Chengyu Zheng Xiu Li Yanli Gao Jie Nie Lei Huang Zhiqiang Wei |
author_sort | Tianci Sun |
collection | DOAJ |
description | Cross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly become a hotspot for research. Existing prompts typically emphasize either global or local information, which fails to excavate or fully leverage the effective information of cross-modal data, resulting in the subpar performance of retrieval models. To address these limitations, we propose a novel method called Strong and Weak Prompt Engineering (SWPE) for remote sensing image–text retrieval. Specifically, SWPE employs the Strong and Weak Prompt Generation module to generate fine-grained and global category semantic prompts via an attention mechanism and a pretrained classification model. The prompt-guided feature fine-tuning module then refines the prompt information using a Transformer architecture, integrating the refined prompts with high-level image, and text features to enhance both fine-grained details and global semantics. Finally, the adaptive hard sample elimination module optimizes the triplet loss function by training the model with negative sample pairs of varying difficulty, assigning higher weights to simpler pairs. Extensive quantitative and qualitative experiments on four remote sensing benchmarks validate the superior effectiveness of SWPE. |
format | Article |
id | doaj-art-f2723e37d12c4e078adece0ce4aa6a7c |
institution | Matheson Library |
issn | 1939-1404 2151-1535 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
spelling | doaj-art-f2723e37d12c4e078adece0ce4aa6a7c2025-07-28T23:00:09ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01186968698010.1109/JSTARS.2025.353447410855571Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal RetrievalTianci Sun0https://orcid.org/0009-0000-4452-8139Chengyu Zheng1https://orcid.org/0000-0002-5948-0032Xiu Li2https://orcid.org/0009-0006-8822-5280Yanli Gao3Jie Nie4https://orcid.org/0000-0003-4952-7666Lei Huang5https://orcid.org/0000-0003-4087-3677Zhiqiang Wei6https://orcid.org/0000-0002-2830-8301Faculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaFaculty of Information Science and Engineering, Ocean University of China, Qingdao, ChinaCross-modal retrieval is vital at the intersection of vision and language. Specifically, remote sensing image–text retrieval enhances our understanding of complex remote sensing content by combining multiperspective visual information with concise textual descriptions and has increasingly become a hotspot for research. Existing prompts typically emphasize either global or local information, which fails to excavate or fully leverage the effective information of cross-modal data, resulting in the subpar performance of retrieval models. To address these limitations, we propose a novel method called Strong and Weak Prompt Engineering (SWPE) for remote sensing image–text retrieval. Specifically, SWPE employs the Strong and Weak Prompt Generation module to generate fine-grained and global category semantic prompts via an attention mechanism and a pretrained classification model. The prompt-guided feature fine-tuning module then refines the prompt information using a Transformer architecture, integrating the refined prompts with high-level image, and text features to enhance both fine-grained details and global semantics. Finally, the adaptive hard sample elimination module optimizes the triplet loss function by training the model with negative sample pairs of varying difficulty, assigning higher weights to simpler pairs. Extensive quantitative and qualitative experiments on four remote sensing benchmarks validate the superior effectiveness of SWPE.https://ieeexplore.ieee.org/document/10855571/Image-text cross-modal retrievalremote sensingprompt engineering |
spellingShingle | Tianci Sun Chengyu Zheng Xiu Li Yanli Gao Jie Nie Lei Huang Zhiqiang Wei Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Image-text cross-modal retrieval remote sensing prompt engineering |
title | Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval |
title_full | Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval |
title_fullStr | Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval |
title_full_unstemmed | Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval |
title_short | Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval |
title_sort | strong and weak prompt engineering for remote sensing image text cross modal retrieval |
topic | Image-text cross-modal retrieval remote sensing prompt engineering |
url | https://ieeexplore.ieee.org/document/10855571/ |
work_keys_str_mv | AT tiancisun strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT chengyuzheng strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT xiuli strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT yanligao strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT jienie strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT leihuang strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval AT zhiqiangwei strongandweakpromptengineeringforremotesensingimagetextcrossmodalretrieval |