Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons

Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study...

Full description

Saved in:
Bibliographic Details
Main Authors: Benjamin Nieves-Lopez, BS, Clayton Wing, MD, Bryan D. Springer, MD, Keith T. Aziz, MD
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Arthroplasty Today
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352344125001591
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839629559193927680
author Benjamin Nieves-Lopez, BS
Clayton Wing, MD
Bryan D. Springer, MD
Keith T. Aziz, MD
author_facet Benjamin Nieves-Lopez, BS
Clayton Wing, MD
Bryan D. Springer, MD
Keith T. Aziz, MD
author_sort Benjamin Nieves-Lopez, BS
collection DOAJ
description Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT’s performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending. Methods: A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a Chi-square test. Further comparisons were performed to assess ChatGPT’s accuracy rate on image-based questions. Statistical significance was set to a P value ≤ .05. Results: ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture. Conclusions: ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care.
format Article
id doaj-art-80ca5eb2615c408f97a0fb8fad8a0f08
institution Matheson Library
issn 2352-3441
language English
publishDate 2025-08-01
publisher Elsevier
record_format Article
series Arthroplasty Today
spelling doaj-art-80ca5eb2615c408f97a0fb8fad8a0f082025-07-15T04:16:14ZengElsevierArthroplasty Today2352-34412025-08-0134101772Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction SurgeonsBenjamin Nieves-Lopez, BS0Clayton Wing, MD1Bryan D. Springer, MD2Keith T. Aziz, MD3University of Puerto Rico, Medical Sciences Campus, San Juan, Puerto Rico; Corresponding author. Dr. Jose Celso Barbosa, San Juan 00646, Puerto Rico. Tel.: +1 787 201 1812.Department of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLDepartment of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLDepartment of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLBackground: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT’s performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending. Methods: A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a Chi-square test. Further comparisons were performed to assess ChatGPT’s accuracy rate on image-based questions. Statistical significance was set to a P value ≤ .05. Results: ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture. Conclusions: ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care.http://www.sciencedirect.com/science/article/pii/S2352344125001591ChatGPT 4ArthroplastyAdult reconstructionOrthopaedic education
spellingShingle Benjamin Nieves-Lopez, BS
Clayton Wing, MD
Bryan D. Springer, MD
Keith T. Aziz, MD
Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
Arthroplasty Today
ChatGPT 4
Arthroplasty
Adult reconstruction
Orthopaedic education
title Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_full Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_fullStr Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_full_unstemmed Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_short Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
title_sort exploring chatgpt s efficacy in orthopaedic arthroplasty questions compared to adult reconstruction surgeons
topic ChatGPT 4
Arthroplasty
Adult reconstruction
Orthopaedic education
url http://www.sciencedirect.com/science/article/pii/S2352344125001591
work_keys_str_mv AT benjaminnieveslopezbs exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons
AT claytonwingmd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons
AT bryandspringermd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons
AT keithtazizmd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons