An enhanced approach for automatic annotation of error codes based on Seq2edit

The deep natural language translation models have been used for automatic code error correction and have demonstrated outstanding potential. However, a large and accurately annotated training dataset is essential for these models to perform well. The key to improving the performance of these models...

Full description

Saved in:
Bibliographic Details
Main Authors: Jian Wang, Tao Lin, Rongsen Zhao, Huiling Zhao
Format: Article
Language:English
Published: PeerJ Inc. 2025-07-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-3024.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The deep natural language translation models have been used for automatic code error correction and have demonstrated outstanding potential. However, a large and accurately annotated training dataset is essential for these models to perform well. The key to improving the performance of these models lies in automatically and accurately annotating code errors and establishing a larger training dataset. Recently, a code error automatic annotation method based on Seq2edit has been proposed to optimize the dataset. However, the accuracy of the annotation is affected because tokens in the input code from the same statement may be aligned to different statements. This article proposes a Seq2edit annotation method based on the source code’s sentence structure. By dividing the code into statements with independent meanings and introducing a cost coefficient to improve the Levenshtein algorithm, this method optimizes the calculation of edit distance and enhances the ability to align tokens. Experimental results show that this method can fully utilize the contextual information of the source code during the automatic annotation process, leading to a significant improvement in annotation accuracy.
ISSN:2376-5992