To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders

<b>Background/Objectives</b>: The increased accessibility of information has resulted in a rise in patients trying to self-diagnose and opting for self-medication, either as a primary treatment or as a supplement to medical care. Our objective was to evaluate the reliability, comprehensi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ufuk Arzu, Batuhan Gencer
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Diagnostics
Subjects:	ChatGPT self-diagnosis self-treatment readability Flesch–Kincaid Grade Level trauma
Online Access:	https://www.mdpi.com/2075-4418/15/14/1834
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	<b>Background/Objectives</b>: The increased accessibility of information has resulted in a rise in patients trying to self-diagnose and opting for self-medication, either as a primary treatment or as a supplement to medical care. Our objective was to evaluate the reliability, comprehensibility, and readability of the responses provided by ChatGPT 4.0 when queried about the most prevalent orthopaedic problems, thus ascertaining the occurrence of misguidance and the necessity for an audit of the disseminated information. <b>Methods:</b> ChatGPT 4.0 was presented with 26 open-ended questions. The responses were evaluated by two observers using a Likert scale in the categories of diagnosis, recommendation, and referral. The scores from the responses were subjected to subgroup analysis according to the area of interest (AoI) and anatomical region. The readability and comprehensibility of the chatbot’s responses were analyzed using the Flesch–Kincaid Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL). <b>Results:</b> The majority of the responses were rated as either ‘adequate’ or ‘excellent’. However, in the diagnosis category, a significant difference was found in the evaluation made according to the AoI (<i>p</i> = 0.007), which is attributed to trauma-related questions. No significant difference was identified in any other category. The mean FKGL score was 7.8 ± 1.267, and the mean FRES was 52.68 ± 8.6. The average estimated reading level required to understand the text was considered as “high school”. <b>Conclusions:</b> ChatGPT 4.0 facilitates the self-diagnosis and self-treatment tendencies of patients with musculoskeletal disorders. However, it is imperative for patients to have a robust understanding of the limitations of chatbot-generated advice, particularly in trauma-related conditions.
ISSN:	2075-4418

To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders

Similar Items