Integrating Abstract Meaning Representation to Enhance Transformer-Based Image Captioning
Although recent image captioning models have achieved substantial progress, they still encounter limitations in capturing abstract semantics, resulting in insufficient semantic depth and limited diversity in expression. Meanwhile, Abstract Meaning Representation (AMR), a form of abstract semantic re...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/11058972/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Although recent image captioning models have achieved substantial progress, they still encounter limitations in capturing abstract semantics, resulting in insufficient semantic depth and limited diversity in expression. Meanwhile, Abstract Meaning Representation (AMR), a form of abstract semantic representation, has been successfully applied in various natural language processing tasks. However, exploiting AMR in multimodal contexts, particularly for image captioning, remains largely unexplored. To address these limitations, this paper proposes a novel image captioning model within an encoder-decoder framework that leverages the abstract semantics of images through AMR. Specifically, AMR is incorporated into the model in two ways: 1) extracting AMR from ground-truth captions and 2) converting the image’s relational graph into an AMR-like graph to enrich abstract semantics. These AMR embeddings are fused with object-region features and relational-graph embeddings via a cross-modal attention mechanism. Additionally, embeddings from the AMR-like graph are integrated into the Transformer decoder using a masked multi-head attention mechanism to enhance semantic coherence during caption generation. Experimental results on the MS COCO and Flickr30k datasets demonstrate that the proposed model achieves superior captioning accuracy compared to recent state-of-the-art methods, confirming the effectiveness of incorporating AMR in image captioning tasks. |
---|---|
ISSN: | 2169-3536 |