Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation

Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP m...

Full description

Saved in:
Bibliographic Details
Main Authors: Enas Abbas Abed, Taoufik Aguili
Format: Article
Language:English
Published: University of Diyala 2025-06-01
Series:Diyala Journal of Engineering Sciences
Subjects:
Online Access:https://djes.info/index.php/djes/article/view/1752
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839623477461516288
author Enas Abbas Abed
Taoufik Aguili
author_facet Enas Abbas Abed
Taoufik Aguili
author_sort Enas Abbas Abed
collection DOAJ
description Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP model, which is an automatic medical image clinical captioning model. To refine the BLIP model, a methodology was designed based on more than 81,000 radiology images with Unified Medical Language System (UMLS) identifiers, which were obtained by the ROCO (Radiology Objects in Context) dataset. A representative subset of 1,000 images was chosen to fit within computational limitations- 800 images were used in training, 100 in validation and 100 in testing, but with the preservation of representation across major imaging modalities. They trained the model on transformer-based encoder-decoder with cross-attention mechanisms. The four key contributions of this work are (1) domain-specific fine-tuning of the model to the radiological setting, (2) the use of standardized medical terminology by using UMLS concept unique identifiers, (3) integration of explainable AI with attention heatmaps and post-hoc explanations (SHAP and LIME), and (4) evaluation of performance using accepted NLP metrics. The model attained a high semantic and clinical agreement with quantitative scores of 0.7300 (BLEU-4), 0.6101 (METEOR), and 0.8405 (ROUGE). These results prompt the idea that AI-based image captioning has a considerable potential in facilitating clinical documentation and increasing the reliability of radiological assessments.
format Article
id doaj-art-1694f8c7c96f44e385d4a814b64c93bb
institution Matheson Library
issn 1999-8716
2616-6909
language English
publishDate 2025-06-01
publisher University of Diyala
record_format Article
series Diyala Journal of Engineering Sciences
spelling doaj-art-1694f8c7c96f44e385d4a814b64c93bb2025-07-19T23:27:48ZengUniversity of DiyalaDiyala Journal of Engineering Sciences1999-87162616-69092025-06-0118210.24237/djes.2025.18215Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language GenerationEnas Abbas Abed0Taoufik Aguili1Department of Computer Engineering, University of Diyala, Diyala, IraqDepartment of Communications System, University of Tunis El Manar, Tunisia Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP model, which is an automatic medical image clinical captioning model. To refine the BLIP model, a methodology was designed based on more than 81,000 radiology images with Unified Medical Language System (UMLS) identifiers, which were obtained by the ROCO (Radiology Objects in Context) dataset. A representative subset of 1,000 images was chosen to fit within computational limitations- 800 images were used in training, 100 in validation and 100 in testing, but with the preservation of representation across major imaging modalities. They trained the model on transformer-based encoder-decoder with cross-attention mechanisms. The four key contributions of this work are (1) domain-specific fine-tuning of the model to the radiological setting, (2) the use of standardized medical terminology by using UMLS concept unique identifiers, (3) integration of explainable AI with attention heatmaps and post-hoc explanations (SHAP and LIME), and (4) evaluation of performance using accepted NLP metrics. The model attained a high semantic and clinical agreement with quantitative scores of 0.7300 (BLEU-4), 0.6101 (METEOR), and 0.8405 (ROUGE). These results prompt the idea that AI-based image captioning has a considerable potential in facilitating clinical documentation and increasing the reliability of radiological assessments. https://djes.info/index.php/djes/article/view/1752Medical Image CaptioningBLIP ModelRadiologyUMLSDiagnostic SupportTransformer Models
spellingShingle Enas Abbas Abed
Taoufik Aguili
Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
Diyala Journal of Engineering Sciences
Medical Image Captioning
BLIP Model
Radiology
UMLS
Diagnostic Support
Transformer Models
title Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_full Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_fullStr Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_full_unstemmed Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_short Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_sort automated medical image captioning using the blip model enhancing diagnostic support with ai driven language generation
topic Medical Image Captioning
BLIP Model
Radiology
UMLS
Diagnostic Support
Transformer Models
url https://djes.info/index.php/djes/article/view/1752
work_keys_str_mv AT enasabbasabed automatedmedicalimagecaptioningusingtheblipmodelenhancingdiagnosticsupportwithaidrivenlanguagegeneration
AT taoufikaguili automatedmedicalimagecaptioningusingtheblipmodelenhancingdiagnosticsupportwithaidrivenlanguagegeneration