A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow

The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. How...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhenqiang Zhao, Helong Shen, Meng Wang, Yufei Wang
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Journal of Marine Science and Engineering
Subjects:	intelligent ships image captioning generation transformer
Online Access:	https://www.mdpi.com/2077-1312/13/7/1204
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often struggle to perform effectively in complex maritime environments due to limitations in visual feature extraction and semantic modeling. To address these challenges, this study proposes a transformer dual-stream information (TDSI) model. The proposed model uses a Swin-transformer to extract grid features and combines them with fine-grained scene semantics obtained via SegFormer. A dual-encoder structure independently encodes the grid and segmentation features, which are subsequently fused through a feature fusion module for implicit integration. A decoder with a cross-attention mechanism is then employed to generate descriptive captions for maritime images. Extensive experiments were conducted using the constructed maritime semantic segmentation and maritime image captioning datasets. The results demonstrate that the proposed TDSI model outperforms existing mainstream methods in terms of several evaluation metrics, including BLEU, METEOR, ROUGE, and CIDEr. These findings confirm the effectiveness of the TDSI model in enhancing image captioning performance in maritime environments.
ISSN:	2077-1312

A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow

Similar Items