VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback

As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliabili...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier
Format:	Article
Language:	English
Published:	Elsevier 2025-09-01
Series:	Machine Learning with Applications
Subjects:	Chest X-ray Phrase grounding Image generation Interpretability
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666827025000672
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839655937199046656
author	Sayeh Gholipour Picha Dawood Al Chanti Alice Caplier
author_facet	Sayeh Gholipour Picha Dawood Al Chanti Alice Caplier
author_sort	Sayeh Gholipour Picha
collection	DOAJ
description	As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment between text and image context and the localization accuracy of pathologies within images and reports for AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies localization accuracy, while the other evaluates semantic consistency between text and image features. Our approach significantly outperforms existing methods in pathology localization, achieving an 8% improvement in Intersection over Union score. It also surpasses state-of-the-art methods in CXR text-to-image generation, with a 1% gain in similarity metrics. Additionally, the integration of phrase grounding with diffusion models, coupled with the dual-scoring evaluation system, provides a robust mechanism for validating report quality, paving the way for more reliable and transparent AI in medical imaging.
format	Article
id	doaj-art-bccd2e84bf9c4057b0a5e075ad2c211b
institution	Matheson Library
issn	2666-8270
language	English
publishDate	2025-09-01
publisher	Elsevier
record_format	Article
series	Machine Learning with Applications
spelling	doaj-art-bccd2e84bf9c4057b0a5e075ad2c211b2025-06-25T04:52:28ZengElsevierMachine Learning with Applications2666-82702025-09-0121100684VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedbackSayeh Gholipour Picha0Dawood Al Chanti1Alice Caplier2Corresponding author.; University Grenoble Alpes Grenoble Institute of Technology, Grenoble, 38000, FranceUniversity Grenoble Alpes Grenoble Institute of Technology, Grenoble, 38000, FranceUniversity Grenoble Alpes Grenoble Institute of Technology, Grenoble, 38000, FranceAs artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment between text and image context and the localization accuracy of pathologies within images and reports for AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies localization accuracy, while the other evaluates semantic consistency between text and image features. Our approach significantly outperforms existing methods in pathology localization, achieving an 8% improvement in Intersection over Union score. It also surpasses state-of-the-art methods in CXR text-to-image generation, with a 1% gain in similarity metrics. Additionally, the integration of phrase grounding with diffusion models, coupled with the dual-scoring evaluation system, provides a robust mechanism for validating report quality, paving the way for more reliable and transparent AI in medical imaging.http://www.sciencedirect.com/science/article/pii/S2666827025000672Chest X-rayPhrase groundingImage generationInterpretability
spellingShingle	Sayeh Gholipour Picha Dawood Al Chanti Alice Caplier VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback Machine Learning with Applications Chest X-ray Phrase grounding Image generation Interpretability
title	VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback
title_full	VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback
title_fullStr	VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback
title_full_unstemmed	VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback
title_short	VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback
title_sort	vicca visual interpretation and comprehension of chest x ray anomalies in generated report without human feedback
topic	Chest X-ray Phrase grounding Image generation Interpretability
url	http://www.sciencedirect.com/science/article/pii/S2666827025000672
work_keys_str_mv	AT sayehgholipourpicha viccavisualinterpretationandcomprehensionofchestxrayanomaliesingeneratedreportwithouthumanfeedback AT dawoodalchanti viccavisualinterpretationandcomprehensionofchestxrayanomaliesingeneratedreportwithouthumanfeedback AT alicecaplier viccavisualinterpretationandcomprehensionofchestxrayanomaliesingeneratedreportwithouthumanfeedback

VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback

Similar Items