Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation

Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This stud...

Full description

Saved in:
Bibliographic Details
Main Authors: Joon Yul Choi, Tae Keun Yoo
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-09-01
Series:Informatics and Health
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949953425000219
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates ChatGPT-4o’s performance in ophthalmic image interpretation, exemplar-guided reasoning (in-context learning), and code-free diagnostic tool generation using publicly available datasets. Methods: We assessed ChatGPT-4o through three clinically relevant tasks: (1) image interpretation without prior examples, using fundus, external ocular, and facial photographs representing key ophthalmic conditions; (2) in-context learning with example-based prompts to improve classification accuracy; and (3) generation of an interactive HTML-based decision-support tool from a clinical diagnostic algorithm. All evaluations were performed using open-access datasets without model fine-tuning Results: When interpreting images without reference examples, ChatGPT-4o achieved diagnostic accuracies of 90.3 % for diabetic retinopathy, 77.4 % for age-related macular degeneration, 100 % for conjunctival melanoma, 97.3 % for pterygium, and 85.7 % for strabismus subtypes. In-context learning consistently improved diagnostic performance across all modalities, with strabismus classification reaching 100 % accuracy. Compared to EfficientNetB2, ChatGPT-4o demonstrated comparable or superior performance in several diagnostic tasks. Additionally, the model successfully translated schematic clinical algorithms into functional, browser-based diagnostic tools using natural language prompts alone. Conclusions: ChatGPT-4o demonstrates promise in ophthalmic image interpretation and low-code clinical tool development, particularly when guided by in-context learning. However, these findings are based on a limited diagnostic spectrum and publicly available datasets. Broader clinical validation and head-to-head comparisons with domain-specific models are needed to establish its practical utility in ophthalmology.
ISSN:2949-9534