A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow
The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. How...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Journal of Marine Science and Engineering |
Subjects: | |
Online Access: | https://www.mdpi.com/2077-1312/13/7/1204 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often struggle to perform effectively in complex maritime environments due to limitations in visual feature extraction and semantic modeling. To address these challenges, this study proposes a transformer dual-stream information (TDSI) model. The proposed model uses a Swin-transformer to extract grid features and combines them with fine-grained scene semantics obtained via SegFormer. A dual-encoder structure independently encodes the grid and segmentation features, which are subsequently fused through a feature fusion module for implicit integration. A decoder with a cross-attention mechanism is then employed to generate descriptive captions for maritime images. Extensive experiments were conducted using the constructed maritime semantic segmentation and maritime image captioning datasets. The results demonstrate that the proposed TDSI model outperforms existing mainstream methods in terms of several evaluation metrics, including BLEU, METEOR, ROUGE, and CIDEr. These findings confirm the effectiveness of the TDSI model in enhancing image captioning performance in maritime environments. |
---|---|
ISSN: | 2077-1312 |