Performance Evaluation of Large Language Model Chatbots for Radiation Therapy Education

This study aimed to develop a large language model (LLM) chatbot for radiation therapy education and compare the performance of portable document format (PDF)- and webpage-based question-and-answer (Q&A) chatbots. An LLM chatbot was created using the EmbedChain framework, OpenAI GPT-3.5-Turbo AP...

Full description

Saved in:
Bibliographic Details
Main Authors: Jae-Hong Jung, Daegun Kim, Kyung-Bae Lee, Youngjin Lee
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/7/521
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study aimed to develop a large language model (LLM) chatbot for radiation therapy education and compare the performance of portable document format (PDF)- and webpage-based question-and-answer (Q&A) chatbots. An LLM chatbot was created using the EmbedChain framework, OpenAI GPT-3.5-Turbo API, and Gradio UI. The performance of both chatbots was evaluated based on 10 questions and their corresponding answers, using the parameters of accuracy, semantic similarity, consistency, and response time. The accuracy scores were 0.672 and 0.675 for the PDF- and webpage-based Q&A chatbots, respectively. The semantic similarity between the two chatbots was 0.928 (92.8%). The consistency score was one for both chatbots. The average response time was 3.3 s and 2.38 s for the PDF- and webpage-based chatbots, respectively. The LLM chatbot developed in this study demonstrates the potential to provide reliable responses for radiation therapy education. However, its reliability and efficiency must be further optimized to be effectively utilized as an educational tool.
ISSN:2078-2489