Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons

Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study...

Full description

Saved in:

Bibliographic Details
Main Authors:	Benjamin Nieves-Lopez, BS, Clayton Wing, MD, Bryan D. Springer, MD, Keith T. Aziz, MD
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	Arthroplasty Today
Subjects:	ChatGPT 4 Arthroplasty Adult reconstruction Orthopaedic education
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352344125001591
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT’s performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending. Methods: A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a Chi-square test. Further comparisons were performed to assess ChatGPT’s accuracy rate on image-based questions. Statistical significance was set to a P value ≤ .05. Results: ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture. Conclusions: ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care.
ISSN:	2352-3441

Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons

Similar Items