Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management

Background: Integrating machine learning and artificial intelligence (AI) technologies has revolutionized various sectors, including health care. However, their application in orthopedic health-care settings still needs to be improved. This study sought to evaluate Chat Generative Pre-Trained Transf...

Full description

Saved in:
Bibliographic Details
Main Authors: Michael Megafu, DO, MPH, Omar Guerrero, BS, Rafay Hasan, BS, Larry Hunt, MBA, Devri Langhelm, BS, Benning Le, MS, Xinning Li, MD, Robert Kelly, IV, MD, Robert L. Parisien, MD, Antonio Cusano, MD
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:JSES International
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666638325000933
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Integrating machine learning and artificial intelligence (AI) technologies has revolutionized various sectors, including health care. However, their application in orthopedic health-care settings still needs to be improved. This study sought to evaluate Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini's capacity to make quality medical recommendations regarding glenohumeral osteoarthritis, weighing them against the recommendations established in the Evidence-Based Clinical Practice Guidelines (CPGs) of the American Academy of Orthopaedic Surgeons (AAOS). Methods: The 2020 AAOS CPGs, a widely recognized and respected source, were the basis for determining recommended and nonrecommended treatments in this study. ChatGPT and Gemini were queried on 20 treatments based on these guidelines; 10 were recommended for managing glenohumeral joint osteoarthritis, five were not recommended for managing glenohumeral joint osteoarthritis, and five were reported as consensus statements. These responses were categorized as “Concordance” or “No Concordance” with the AAOS CPGs. A Cohen's Kappa coefficient was calculated to assess the interrater reliability. Results: Among the 20 treatments examined, ChatGPT and Gemini showed concordance with the AAOS CPGs for 10 (100%) and 5 (50%) treatments, respectively. On the other hand, for treatments that AAOS CPGs did not recommend, ChatGPT had concordance for four out of the five treatments (80%), while Gemini had 100% concordance. The Cohen's Kappa coefficient to assess interrater reliability was found to be 0.90, indicating a very high level of agreement between the two raters in categorizing responses as “Concordance” or “No Concordance” with the AAOS CPGs. Conclusion: The study findings reveal that ChatGPT and Gemini cannot solely recommend CPGs as outlined in AAOS CPGs. As patients increasingly utilize external resources such as AI platforms and the Internet for medical recommendations, providers should advise patients to exercise caution when seeking medical advice from these AI platforms for managing glenohumeral joint osteoarthritis.
ISSN:2666-6383