Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study
Abstract BackgroundThe integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMIR Publications
2025-07-01
|
Series: | JMIR Medical Informatics |
Online Access: | https://medinform.jmir.org/2025/1/e68980 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract
BackgroundThe integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both clinicians and patients. However, its performance in the outpatient setting of general surgery remains underexplored.
ObjectiveThis study aimed to evaluate whether ChatGPT-4 can function as a virtual outpatient assistant in the management of puerperal mastitis by assessing the accuracy, clarity, and clinical safety of its responses to frequently asked patient questions in Turkish.
MethodsFifteen questions about puerperal mastitis were sourced from public health care websites and online forums. These questions were categorized into general information (n=2), symptoms and diagnosis (n=6), treatment (n=2), and prognosis (n=5). Each question was entered into ChatGPT-4 (September 3, 2024), and a single Turkish-language response was obtained. The responses were evaluated by a panel consisting of 3 board-certified general surgeons and 2 general surgery residents, using five criteria: sufficient length, patient-understandable language, accuracy, adherence to current guidelines, and patient safety. Quantitative metrics included the DISCERN score, Flesch-Kincaid readability score, and inter-rater reliability assessed using the intraclass correlation coefficient (ICC).
ResultsA total of 15 questions were evaluated. ChatGPT’s responses were rated as “excellent” overall by the evaluators, with higher scores observed for treatment- and prognosis-related questions. A statistically significant difference was found in DISCERN scores across question types (PJAMAPrP
ConclusionsChatGPT demonstrated adequate capability in providing information on puerperal mastitis, particularly for treatment and prognosis. However, evaluator variability and the subjective nature of assessments highlight the need for further optimization of AI tools. Future research should emphasize iterative questioning and dynamic updates to AI knowledge bases to enhance reliability and accessibility. |
---|---|
ISSN: | 2291-9694 |