A GPT-Based Code Review System With Accurate Feedback for Programming Education

The increasing demand for programming education and growing class sizes require immediate and personalized feedback. However, integrating Large Language Models (LLMs) like ChatGPT in introductory programming courses raises concerns about AI-assisted cheating. In large-scale settings, faulty code sub...

Full description

Saved in:
Bibliographic Details
Main Authors: Dong-Kyu Lee, Inwhee Joe
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11039773/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The increasing demand for programming education and growing class sizes require immediate and personalized feedback. However, integrating Large Language Models (LLMs) like ChatGPT in introductory programming courses raises concerns about AI-assisted cheating. In large-scale settings, faulty code submissions may lead LLMs to overanalyze, causing unnecessary token consumption. This paper proposes a GPT-4o-based code review system that provides accurate feedback while reducing token usage and preventing AI-assisted cheating. Unlike general-purpose LLM tools for professionals, the system is pedagogically designed for primary and secondary students by focusing on review necessity and learner-friendly feedback. The system features a Code Review Module (CRM) that reduces token usage via a Review Necessity Chain (RNC), and Code Correctness Check Module (CCM) combining test case validation with LLM-based assessment. To prevent AI-assisted cheating, the system provides automated feedback on submitted code without prompting and revealing correct answers, which are accessed only through the “Ask Code Tutor” button. In usability test, the system detected up to 42.86% more errors than a conventional online judge. BERTScore analysis showed that over 80% of the system-generated reviews were semantically aligned with human feedback. A performance comparison with state-of-the-art systems demonstrated a blocking success rate of 86%, with a comparable review omission rate. These results indicate that the system provides more accurate feedback than conventional automated code reviews, while achieving token efficiency and supporting self-directed learning through educational feedback. Thus, it can serve as a practical solution for scalable programming education in primary and secondary classes.
ISSN:2169-3536