Visual question answering for medical diagnosis

The use of Artificial Intelligence (AI) in medical diagnosis is a breakthrough in healthcare, improving both accuracy and efficiency. Recently, a significant advancement has been made toward the development of multimodal AI systems that can process and integrate multiple types of data or modalities....

Full description

Saved in:

Bibliographic Details
Main Authors:	Nawel Ben Chaabane, Mohamed Bal-Ghaoui
Format:	Article
Language:	English
Published:	Elsevier 2025-09-01
Series:	Intelligent Systems with Applications
Subjects:	VQA VLM LLM VLP ViT Medical diagnosis
Online Access:	http://www.sciencedirect.com/science/article/pii/S2667305325000717
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The use of Artificial Intelligence (AI) in medical diagnosis is a breakthrough in healthcare, improving both accuracy and efficiency. Recently, a significant advancement has been made toward the development of multimodal AI systems that can process and integrate multiple types of data or modalities. This ability is key for interpreting medical images, such as X-rays, CT, and MRI scans, as well as textual data like electronic health records (EHRs) and clinical notes. In this era, Visual Question Answering (VQA) systems have demonstrated a potential use case in the medical domain. These systems, typically based on Vision-Language Models (VLMs), can answer natural lan- guage questions based on medical images, offering precise and relevant re- sponses that help doctors make better decisions.In this article, we evaluate existing medical VQA models along with general and trending ones to make medical diagnoses. In particular, we focus on addressing abnormality questions considered challenging in the literature. Our approach consists of evaluating the Zero-Shot (ZS) general and domain-specific capabilities of different models using two created datasets, and fine-tuning the best-found models on the training set of the abnormality dataset before evaluating their performances quantitatively and qualitatively. IdeficMed, a generative domain-specific model, achieved better consistency and VQA outcomes by only training 0.22 % of its parameters. Additionally, we employed uncertainty quantification techniques (e.g., Monte Carlo dropout) to assess the confidence of the fine-tuned models in their predictions. We also conducted a sensitivity analysis on input perturbations, such as image noise and ambiguous questions.
ISSN:	2667-3053

Visual question answering for medical diagnosis

Similar Items