Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation

Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP m...

Full description

Saved in:

Bibliographic Details
Main Authors:	Enas Abbas Abed, Taoufik Aguili
Format:	Article
Language:	English
Published:	University of Diyala 2025-06-01
Series:	Diyala Journal of Engineering Sciences
Subjects:	Medical Image Captioning BLIP Model Radiology UMLS Diagnostic Support Transformer Models
Online Access:	https://djes.info/index.php/djes/article/view/1752
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839623477461516288
author	Enas Abbas Abed Taoufik Aguili
author_facet	Enas Abbas Abed Taoufik Aguili
author_sort	Enas Abbas Abed
collection	DOAJ
description	Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP model, which is an automatic medical image clinical captioning model. To refine the BLIP model, a methodology was designed based on more than 81,000 radiology images with Unified Medical Language System (UMLS) identifiers, which were obtained by the ROCO (Radiology Objects in Context) dataset. A representative subset of 1,000 images was chosen to fit within computational limitations- 800 images were used in training, 100 in validation and 100 in testing, but with the preservation of representation across major imaging modalities. They trained the model on transformer-based encoder-decoder with cross-attention mechanisms. The four key contributions of this work are (1) domain-specific fine-tuning of the model to the radiological setting, (2) the use of standardized medical terminology by using UMLS concept unique identifiers, (3) integration of explainable AI with attention heatmaps and post-hoc explanations (SHAP and LIME), and (4) evaluation of performance using accepted NLP metrics. The model attained a high semantic and clinical agreement with quantitative scores of 0.7300 (BLEU-4), 0.6101 (METEOR), and 0.8405 (ROUGE). These results prompt the idea that AI-based image captioning has a considerable potential in facilitating clinical documentation and increasing the reliability of radiological assessments.
format	Article
id	doaj-art-1694f8c7c96f44e385d4a814b64c93bb
institution	Matheson Library
issn	1999-8716 2616-6909
language	English
publishDate	2025-06-01
publisher	University of Diyala
record_format	Article
series	Diyala Journal of Engineering Sciences
spelling	doaj-art-1694f8c7c96f44e385d4a814b64c93bb2025-07-19T23:27:48ZengUniversity of DiyalaDiyala Journal of Engineering Sciences1999-87162616-69092025-06-0118210.24237/djes.2025.18215Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language GenerationEnas Abbas Abed0Taoufik Aguili1Department of Computer Engineering, University of Diyala, Diyala, IraqDepartment of Communications System, University of Tunis El Manar, Tunisia Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP model, which is an automatic medical image clinical captioning model. To refine the BLIP model, a methodology was designed based on more than 81,000 radiology images with Unified Medical Language System (UMLS) identifiers, which were obtained by the ROCO (Radiology Objects in Context) dataset. A representative subset of 1,000 images was chosen to fit within computational limitations- 800 images were used in training, 100 in validation and 100 in testing, but with the preservation of representation across major imaging modalities. They trained the model on transformer-based encoder-decoder with cross-attention mechanisms. The four key contributions of this work are (1) domain-specific fine-tuning of the model to the radiological setting, (2) the use of standardized medical terminology by using UMLS concept unique identifiers, (3) integration of explainable AI with attention heatmaps and post-hoc explanations (SHAP and LIME), and (4) evaluation of performance using accepted NLP metrics. The model attained a high semantic and clinical agreement with quantitative scores of 0.7300 (BLEU-4), 0.6101 (METEOR), and 0.8405 (ROUGE). These results prompt the idea that AI-based image captioning has a considerable potential in facilitating clinical documentation and increasing the reliability of radiological assessments. https://djes.info/index.php/djes/article/view/1752Medical Image CaptioningBLIP ModelRadiologyUMLSDiagnostic SupportTransformer Models
spellingShingle	Enas Abbas Abed Taoufik Aguili Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation Diyala Journal of Engineering Sciences Medical Image Captioning BLIP Model Radiology UMLS Diagnostic Support Transformer Models
title	Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_full	Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_fullStr	Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_full_unstemmed	Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_short	Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
title_sort	automated medical image captioning using the blip model enhancing diagnostic support with ai driven language generation
topic	Medical Image Captioning BLIP Model Radiology UMLS Diagnostic Support Transformer Models
url	https://djes.info/index.php/djes/article/view/1752
work_keys_str_mv	AT enasabbasabed automatedmedicalimagecaptioningusingtheblipmodelenhancingdiagnosticsupportwithaidrivenlanguagegeneration AT taoufikaguili automatedmedicalimagecaptioningusingtheblipmodelenhancingdiagnosticsupportwithaidrivenlanguagegeneration

Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation

Similar Items