TDQE:a quality evaluation method for text data in deep learning
Text data quality is an important factor affecting the performance of language models. and its evaluation methodology is considered decisive for model training effectiveness. To address the issues of high computational costs and incomplete evaluation metrics in existing text data quality assessment...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | Chinese |
Published: |
China InfoCom Media Group
2025-01-01
|
Series: | 大数据 |
Subjects: | |
Online Access: | http://www.j-bigdataresearch.com.cn/zh/article/111999072/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Text data quality is an important factor affecting the performance of language models. and its evaluation methodology is considered decisive for model training effectiveness. To address the issues of high computational costs and incomplete evaluation metrics in existing text data quality assessment methods, a deep learning-oriented text data quality evaluation (TDQE) method was proposed. Specifically, (1) the Dropout module of a text summarization model was utilized to generate multiple stochastic sub-networks, producing embedded representations of data samples to capture semantic consistency, thereby evaluating sample robustness; (2) a text similarity matching model was employed to compute the alignment between data samples and their summaries, assessing sample accuracy; (3) weighted robustness and accuracy metrics were designed to quantify overall text data quality. Comparative experiments were conducted on public datasets between TDQE and state-of-the-art methods, and the results demonstrated that TDQE outperformed existing mainstream algorithms. |
---|---|
ISSN: | 2096-0271 |