Visual question answering for medical diagnosis
The use of Artificial Intelligence (AI) in medical diagnosis is a breakthrough in healthcare, improving both accuracy and efficiency. Recently, a significant advancement has been made toward the development of multimodal AI systems that can process and integrate multiple types of data or modalities....
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-09-01
|
Series: | Intelligent Systems with Applications |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2667305325000717 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The use of Artificial Intelligence (AI) in medical diagnosis is a breakthrough in healthcare, improving both accuracy and efficiency. Recently, a significant advancement has been made toward the development of multimodal AI systems that can process and integrate multiple types of data or modalities. This ability is key for interpreting medical images, such as X-rays, CT, and MRI scans, as well as textual data like electronic health records (EHRs) and clinical notes. In this era, Visual Question Answering (VQA) systems have demonstrated a potential use case in the medical domain. These systems, typically based on Vision-Language Models (VLMs), can answer natural lan- guage questions based on medical images, offering precise and relevant re- sponses that help doctors make better decisions.In this article, we evaluate existing medical VQA models along with general and trending ones to make medical diagnoses. In particular, we focus on addressing abnormality questions considered challenging in the literature. Our approach consists of evaluating the Zero-Shot (ZS) general and domain-specific capabilities of different models using two created datasets, and fine-tuning the best-found models on the training set of the abnormality dataset before evaluating their performances quantitatively and qualitatively. IdeficMed, a generative domain-specific model, achieved better consistency and VQA outcomes by only training 0.22 % of its parameters. Additionally, we employed uncertainty quantification techniques (e.g., Monte Carlo dropout) to assess the confidence of the fine-tuned models in their predictions. We also conducted a sensitivity analysis on input perturbations, such as image noise and ambiguous questions. |
---|---|
ISSN: | 2667-3053 |