Automatic text generation system for endangered languages based on conditional generative adversarial networks
This paper explores the application of Conditional Generative Adversarial Networks (CGANs) in the field of endangered language text generation. The focus is on overcoming challenges associated with discrete data handling in natural language generation by utilizing an improved CGAN model. We introduc...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-12-01
|
Series: | Systems and Soft Computing |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772941925001243 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper explores the application of Conditional Generative Adversarial Networks (CGANs) in the field of endangered language text generation. The focus is on overcoming challenges associated with discrete data handling in natural language generation by utilizing an improved CGAN model. We introduce a specialized Loss function, based on the MaliGAN model, which directs the discriminator to guide the generator towards producing texts that not only align closely with individual word accuracy but also maintain overall semantic coherence. Additionally, a beam search decoding strategy is implemented to enhance the global semantic information and diversity of the text output. Our experimental evaluations across multiple datasets, including the Tujia language, Image_COCO, and EMNLP2017 WMT News, demonstrate significant improvements. The LFMGAN model, a variant of CGANs, notably increased BLEU-4 scores by up to 50.7 % for the Tujia language and achieved ROUGE-L score enhancements of up to 86.3 % in the Image_COCO dataset. These results underscore the model's robustness and its potential in preserving linguistic diversity. We discuss integrating advanced models like GPT-2 and RoBERTa to address training instability and gradient explosion challenges. Future research directions include optimizing CGAN parameters using algorithms like particle swarm optimization, refining discriminator outputs in loss calculations, and incorporating cultural and linguistic features specific to endangered languages to improve the quality of the generated texts. |
---|---|
ISSN: | 2772-9419 |