Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.

Text classification plays an essential role in natural language processing and is commonly used in tasks like categorizing news, sentiment analysis, and retrieving relevant information. [0pc][-9pc]Please check and confirm the inserted city and country name for affiliation 1 is appropriate.However, e...

Full description

Saved in:
Bibliographic Details
Main Authors: Bowen Zeng, Xianhe Shang, Rong Lu, Yugui Zhang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0325851
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839634495839404032
author Bowen Zeng
Xianhe Shang
Rong Lu
Yugui Zhang
author_facet Bowen Zeng
Xianhe Shang
Rong Lu
Yugui Zhang
author_sort Bowen Zeng
collection DOAJ
description Text classification plays an essential role in natural language processing and is commonly used in tasks like categorizing news, sentiment analysis, and retrieving relevant information. [0pc][-9pc]Please check and confirm the inserted city and country name for affiliation 1 is appropriate.However, existing models often struggle to perform well on multi-class tasks or complex documents. To overcome these limitations, we propose the PBX model, which integrates both deep learning and traditional machine learning techniques. By utilizing BERT for text pre-training and combining it with the ConvXGB module for classification, the model significantly boosts performance. Hyperparameters are optimized using Particle Swarm Optimization (PSO), enhancing overall accuracy. We tested the model on several datasets, including 20 Newsgroups, Reuters-21578, and AG News, where it outperformed existing models in accuracy, precision, recall, and F1 score. In particular, the PBX model achieved a remarkable 95.0% accuracy and 94.9% F1 score on the AG News dataset. Ablation experiments further validate the contributions of PSO, BERT, and ConvXGB. Future work will focus on improving performance for smaller or ambiguous categories and expanding its practical use across various applications.
format Article
id doaj-art-db1f37cf25b1423b92b3b5d0f70cb5e1
institution Matheson Library
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-db1f37cf25b1423b92b3b5d0f70cb5e12025-07-10T05:31:17ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01207e032585110.1371/journal.pone.0325851Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.Bowen ZengXianhe ShangRong LuYugui ZhangText classification plays an essential role in natural language processing and is commonly used in tasks like categorizing news, sentiment analysis, and retrieving relevant information. [0pc][-9pc]Please check and confirm the inserted city and country name for affiliation 1 is appropriate.However, existing models often struggle to perform well on multi-class tasks or complex documents. To overcome these limitations, we propose the PBX model, which integrates both deep learning and traditional machine learning techniques. By utilizing BERT for text pre-training and combining it with the ConvXGB module for classification, the model significantly boosts performance. Hyperparameters are optimized using Particle Swarm Optimization (PSO), enhancing overall accuracy. We tested the model on several datasets, including 20 Newsgroups, Reuters-21578, and AG News, where it outperformed existing models in accuracy, precision, recall, and F1 score. In particular, the PBX model achieved a remarkable 95.0% accuracy and 94.9% F1 score on the AG News dataset. Ablation experiments further validate the contributions of PSO, BERT, and ConvXGB. Future work will focus on improving performance for smaller or ambiguous categories and expanding its practical use across various applications.https://doi.org/10.1371/journal.pone.0325851
spellingShingle Bowen Zeng
Xianhe Shang
Rong Lu
Yugui Zhang
Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.
PLoS ONE
title Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.
title_full Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.
title_fullStr Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.
title_full_unstemmed Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.
title_short Particle swarm optimization-based NLP methods for optimizing automatic document classification and retrieval.
title_sort particle swarm optimization based nlp methods for optimizing automatic document classification and retrieval
url https://doi.org/10.1371/journal.pone.0325851
work_keys_str_mv AT bowenzeng particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval
AT xianheshang particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval
AT ronglu particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval
AT yuguizhang particleswarmoptimizationbasednlpmethodsforoptimizingautomaticdocumentclassificationandretrieval