Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.
Text classification plays an essential role in natural language processing and is commonly used in tasks like categorizing news, sentiment analysis, and retrieving relevant information. [0pc][-9pc]Please check and confirm the inserted city and country name for affiliation 1 is appropriate.However, e...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2025-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0325851 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839634495839404032 |
---|---|
author | Bowen Zeng Xianhe Shang Rong Lu Yugui Zhang |
author_facet | Bowen Zeng Xianhe Shang Rong Lu Yugui Zhang |
author_sort | Bowen Zeng |
collection | DOAJ |
description | Text classification plays an essential role in natural language processing and is commonly used in tasks like categorizing news, sentiment analysis, and retrieving relevant information. [0pc][-9pc]Please check and confirm the inserted city and country name for affiliation 1 is appropriate.However, existing models often struggle to perform well on multi-class tasks or complex documents. To overcome these limitations, we propose the PBX model, which integrates both deep learning and traditional machine learning techniques. By utilizing BERT for text pre-training and combining it with the ConvXGB module for classification, the model significantly boosts performance. Hyperparameters are optimized using Particle Swarm Optimization (PSO), enhancing overall accuracy. We tested the model on several datasets, including 20 Newsgroups, Reuters-21578, and AG News, where it outperformed existing models in accuracy, precision, recall, and F1 score. In particular, the PBX model achieved a remarkable 95.0% accuracy and 94.9% F1 score on the AG News dataset. Ablation experiments further validate the contributions of PSO, BERT, and ConvXGB. Future work will focus on improving performance for smaller or ambiguous categories and expanding its practical use across various applications. |
format | Article |
id | doaj-art-db1f37cf25b1423b92b3b5d0f70cb5e1 |
institution | Matheson Library |
issn | 1932-6203 |
language | English |
publishDate | 2025-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj-art-db1f37cf25b1423b92b3b5d0f70cb5e12025-07-10T05:31:17ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01207e032585110.1371/journal.pone.0325851Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.Bowen ZengXianhe ShangRong LuYugui ZhangText classification plays an essential role in natural language processing and is commonly used in tasks like categorizing news, sentiment analysis, and retrieving relevant information. [0pc][-9pc]Please check and confirm the inserted city and country name for affiliation 1 is appropriate.However, existing models often struggle to perform well on multi-class tasks or complex documents. To overcome these limitations, we propose the PBX model, which integrates both deep learning and traditional machine learning techniques. By utilizing BERT for text pre-training and combining it with the ConvXGB module for classification, the model significantly boosts performance. Hyperparameters are optimized using Particle Swarm Optimization (PSO), enhancing overall accuracy. We tested the model on several datasets, including 20 Newsgroups, Reuters-21578, and AG News, where it outperformed existing models in accuracy, precision, recall, and F1 score. In particular, the PBX model achieved a remarkable 95.0% accuracy and 94.9% F1 score on the AG News dataset. Ablation experiments further validate the contributions of PSO, BERT, and ConvXGB. Future work will focus on improving performance for smaller or ambiguous categories and expanding its practical use across various applications.https://doi.org/10.1371/journal.pone.0325851 |
spellingShingle | Bowen Zeng Xianhe Shang Rong Lu Yugui Zhang Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval. PLoS ONE |
title | Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval. |
title_full | Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval. |
title_fullStr | Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval. |
title_full_unstemmed | Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval. |
title_short | Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval. |
title_sort | particle swarm optimization based nlp methods for optimizing automatic document classification and retrieval |
url | https://doi.org/10.1371/journal.pone.0325851 |
work_keys_str_mv | AT bowenzeng particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval AT xianheshang particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval AT ronglu particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval AT yuguizhang particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval |