Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons

Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study...

Full description

Saved in:

Bibliographic Details
Main Authors:	Benjamin Nieves-Lopez, BS, Clayton Wing, MD, Bryan D. Springer, MD, Keith T. Aziz, MD
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	Arthroplasty Today
Subjects:	ChatGPT 4 Arthroplasty Adult reconstruction Orthopaedic education
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352344125001591
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839629559193927680
author	Benjamin Nieves-Lopez, BS Clayton Wing, MD Bryan D. Springer, MD Keith T. Aziz, MD
author_facet	Benjamin Nieves-Lopez, BS Clayton Wing, MD Bryan D. Springer, MD Keith T. Aziz, MD
author_sort	Benjamin Nieves-Lopez, BS
collection	DOAJ
description	Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT’s performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending. Methods: A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a Chi-square test. Further comparisons were performed to assess ChatGPT’s accuracy rate on image-based questions. Statistical significance was set to a P value ≤ .05. Results: ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture. Conclusions: ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care.
format	Article
id	doaj-art-80ca5eb2615c408f97a0fb8fad8a0f08
institution	Matheson Library
issn	2352-3441
language	English
publishDate	2025-08-01
publisher	Elsevier
record_format	Article
series	Arthroplasty Today
spelling	doaj-art-80ca5eb2615c408f97a0fb8fad8a0f082025-07-15T04:16:14ZengElsevierArthroplasty Today2352-34412025-08-0134101772Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction SurgeonsBenjamin Nieves-Lopez, BS0Clayton Wing, MD1Bryan D. Springer, MD2Keith T. Aziz, MD3University of Puerto Rico, Medical Sciences Campus, San Juan, Puerto Rico; Corresponding author. Dr. Jose Celso Barbosa, San Juan 00646, Puerto Rico. Tel.: +1 787 201 1812.Department of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLDepartment of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLDepartment of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLBackground: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT’s performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending. Methods: A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a Chi-square test. Further comparisons were performed to assess ChatGPT’s accuracy rate on image-based questions. Statistical significance was set to a P value ≤ .05. Results: ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture. Conclusions: ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care.http://www.sciencedirect.com/science/article/pii/S2352344125001591ChatGPT 4ArthroplastyAdult reconstructionOrthopaedic education
spellingShingle	Benjamin Nieves-Lopez, BS Clayton Wing, MD Bryan D. Springer, MD Keith T. Aziz, MD Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons Arthroplasty Today ChatGPT 4 Arthroplasty Adult reconstruction Orthopaedic education
title	Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_full	Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_fullStr	Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_full_unstemmed	Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_short	Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_sort	exploring chatgpt s efficacy in orthopaedic arthroplasty questions compared to adult reconstruction surgeons
topic	ChatGPT 4 Arthroplasty Adult reconstruction Orthopaedic education
url	http://www.sciencedirect.com/science/article/pii/S2352344125001591
work_keys_str_mv	AT benjaminnieveslopezbs exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons AT claytonwingmd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons AT bryandspringermd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons AT keithtazizmd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons

Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons

Similar Items