Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons
Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-08-01
|
Series: | Arthroplasty Today |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352344125001591 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839629559193927680 |
---|---|
author | Benjamin Nieves-Lopez, BS Clayton Wing, MD Bryan D. Springer, MD Keith T. Aziz, MD |
author_facet | Benjamin Nieves-Lopez, BS Clayton Wing, MD Bryan D. Springer, MD Keith T. Aziz, MD |
author_sort | Benjamin Nieves-Lopez, BS |
collection | DOAJ |
description | Background: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT’s performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending. Methods: A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a Chi-square test. Further comparisons were performed to assess ChatGPT’s accuracy rate on image-based questions. Statistical significance was set to a P value ≤ .05. Results: ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture. Conclusions: ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care. |
format | Article |
id | doaj-art-80ca5eb2615c408f97a0fb8fad8a0f08 |
institution | Matheson Library |
issn | 2352-3441 |
language | English |
publishDate | 2025-08-01 |
publisher | Elsevier |
record_format | Article |
series | Arthroplasty Today |
spelling | doaj-art-80ca5eb2615c408f97a0fb8fad8a0f082025-07-15T04:16:14ZengElsevierArthroplasty Today2352-34412025-08-0134101772Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction SurgeonsBenjamin Nieves-Lopez, BS0Clayton Wing, MD1Bryan D. Springer, MD2Keith T. Aziz, MD3University of Puerto Rico, Medical Sciences Campus, San Juan, Puerto Rico; Corresponding author. Dr. Jose Celso Barbosa, San Juan 00646, Puerto Rico. Tel.: +1 787 201 1812.Department of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLDepartment of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLDepartment of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FLBackground: Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT’s performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending. Methods: A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a Chi-square test. Further comparisons were performed to assess ChatGPT’s accuracy rate on image-based questions. Statistical significance was set to a P value ≤ .05. Results: ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture. Conclusions: ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care.http://www.sciencedirect.com/science/article/pii/S2352344125001591ChatGPT 4ArthroplastyAdult reconstructionOrthopaedic education |
spellingShingle | Benjamin Nieves-Lopez, BS Clayton Wing, MD Bryan D. Springer, MD Keith T. Aziz, MD Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons Arthroplasty Today ChatGPT 4 Arthroplasty Adult reconstruction Orthopaedic education |
title | Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons |
title_full | Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons |
title_fullStr | Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons |
title_full_unstemmed | Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons |
title_short | Exploring ChatGPT’s Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons |
title_sort | exploring chatgpt s efficacy in orthopaedic arthroplasty questions compared to adult reconstruction surgeons |
topic | ChatGPT 4 Arthroplasty Adult reconstruction Orthopaedic education |
url | http://www.sciencedirect.com/science/article/pii/S2352344125001591 |
work_keys_str_mv | AT benjaminnieveslopezbs exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons AT claytonwingmd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons AT bryandspringermd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons AT keithtazizmd exploringchatgptsefficacyinorthopaedicarthroplastyquestionscomparedtoadultreconstructionsurgeons |