Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation

Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP m...

Full description

Saved in:

Bibliographic Details
Main Authors:	Enas Abbas Abed, Taoufik Aguili
Format:	Article
Language:	English
Published:	University of Diyala 2025-06-01
Series:	Diyala Journal of Engineering Sciences
Subjects:	Medical Image Captioning BLIP Model Radiology UMLS Diagnostic Support Transformer Models
Online Access:	https://djes.info/index.php/djes/article/view/1752
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP model, which is an automatic medical image clinical captioning model. To refine the BLIP model, a methodology was designed based on more than 81,000 radiology images with Unified Medical Language System (UMLS) identifiers, which were obtained by the ROCO (Radiology Objects in Context) dataset. A representative subset of 1,000 images was chosen to fit within computational limitations- 800 images were used in training, 100 in validation and 100 in testing, but with the preservation of representation across major imaging modalities. They trained the model on transformer-based encoder-decoder with cross-attention mechanisms. The four key contributions of this work are (1) domain-specific fine-tuning of the model to the radiological setting, (2) the use of standardized medical terminology by using UMLS concept unique identifiers, (3) integration of explainable AI with attention heatmaps and post-hoc explanations (SHAP and LIME), and (4) evaluation of performance using accepted NLP metrics. The model attained a high semantic and clinical agreement with quantitative scores of 0.7300 (BLEU-4), 0.6101 (METEOR), and 0.8405 (ROUGE). These results prompt the idea that AI-based image captioning has a considerable potential in facilitating clinical documentation and increasing the reliability of radiological assessments.
ISSN:	1999-8716 2616-6909

Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation

Similar Items