Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey

Background/purpose: Artificial intelligence (AI) is increasingly vital in dentistry, supporting diagnostics, treatment planning, and patient education. However, AI systems face challenges, especially in delivering accurate information within specialized dental fields. This study aimed to evaluate th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Busra Tosun, Zeynep Sen Yilmaz
Format:	Article
Language:	English
Published:	Elsevier 2025-07-01
Series:	Journal of Dental Sciences
Subjects:	Artificial intelligence Large language models Multiple-choice question Prosthodontics
Online Access:	http://www.sciencedirect.com/science/article/pii/S199179022500025X
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Background/purpose: Artificial intelligence (AI) is increasingly vital in dentistry, supporting diagnostics, treatment planning, and patient education. However, AI systems face challenges, especially in delivering accurate information within specialized dental fields. This study aimed to evaluate the performance of seven AI-based chatbots (ChatGPT-3.5, ChatGPT-4, Gemini, Gemini Advanced, Claude AI, Microsoft Copilot, and Smodin AI) in correctly answering prosthodontics questions from the Dental Specialty Exam (DUS) in Turkey. Materials and methods: The dataset for this study consists of 128 multiple-choice prosthodontics questions from the DUS, a national exam administered in Turkey by the Student Selection and Placement Center (ÖSYM) between 2012 and 2021. Chatbot performance was assessed by categorizing the questions into case-based and knowledge-based. Results: ChatGPT-4 achieved the highest accuracy (75.8 %), while Gemini AI had the lowest (46.1 %). Gemini AI also had more incorrect (69) than correct answers (59). ChatGPT-4 and ChatGPT-3.5 showed significantly higher accuracy in knowledge-based questions compared to case-based ones (p < 0.05). For case-based questions, Gemini and Gemini Advanced had the lowest accuracy (36.4 %), while other chatbots averaged 45.5 %. In knowledge-based questions, ChatGPT-4 performed best (78.6 %) and Gemini AI the worst (47 %). Conclusion: ChatGPT-4 excelled in knowledge-based prosthodontic questions, showing potential to enhance dental education through personalized learning and clinical reasoning support. However, its limitations in case-based scenarios highlight the need for optimization to better address complex clinical situations. These findings suggest that AI models can significantly contribute to dental education and clinical practice.
ISSN:	1991-7902

Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey

Similar Items