Text classification using SVD, BERT, and GRU optimized by improved Seagull optimization (ISO) algorithm
Text classification is a key task in natural language processing that entails sorting textual information into specified categories. Over time, techniques for text classification have progressed from rule-based methods to more advanced deep learning and machine learning approaches. Conventional appr...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
AIP Publishing LLC
2025-06-01
|
Series: | AIP Advances |
Online Access: | http://dx.doi.org/10.1063/5.0270185 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Text classification is a key task in natural language processing that entails sorting textual information into specified categories. Over time, techniques for text classification have progressed from rule-based methods to more advanced deep learning and machine learning approaches. Conventional approaches often struggled with the language intricacies, including issues like contextual links, polysemy, and the ambiguity among words. Nevertheless, neural networks have greatly enhanced text classification by identifying intricate relationships and patterns within text data. Although there have been significant advancements, text classification continues to face challenges, especially when dealing with high-dimensional and large-scale datasets, grasping the contextual meanings of words, and capturing sequential dependencies. In the present research, a Gated Recurrent Unit (GRU) optimized by the Improved Seagull Optimization (ISO) algorithm was utilized to address these issues, resulting in notable improvements in classification performance. The methodology utilized in the current research comprised several phases to guarantee optimum results. Preprocessing was an essential phase, which included addressing missing data, special character and punctuation removal, handling contractions, text normalization, noise removal, and stopword removal. Dimensionality reduction was performed with SVD (Singular Value Decomposition) to reduce the feature set by keeping only the most pertinent data. Contextual embeddings were created utilizing BERT, which offered rich semantic representations of the text and further improved the quality of the input features. Ultimately, the GRU was employed for classification, utilizing the ISO for optimization. This integration of preprocessing, feature extraction, and dimensionality reduction was highly efficacious in overcoming the challenges of text classification. The suggested model illustrated improved efficacy on the Yelp-5 and Yelp-2 datasets, gaining mean accuracy, precision, recall, and F1-score values of 98.47%, 98.71%, 98.92%, and 98.81%, respectively. The outcomes highlight the model’s reliability and strength, exceeding all baseline models, namely GRU, BiGRU, BiLSTM, KNN, LSTM, and CNN. The considerable enhancement over these networks represents the suggested method’s capability in addressing intricate text classification tasks. In conclusion, this work presented a strong architecture for text classification that integrates preprocessing techniques, feature extraction using BERT, dimensionality reduction via SVD, and GRU optimized by ISO for classification. |
---|---|
ISSN: | 2158-3226 |