Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddings

In recent years, opinions and communication can be easily expressed through social media networks that have allowed users to communicate and share their opinions and views, resulting in massive user-generated content. This content may contain text that is hateful to large groups or specific ind...

Full description

Saved in:
Bibliographic Details
Main Authors: Samar Al-Saqqa, Arafat Awajan, Bassam Hammo
Format: Article
Language:English
Published: Growing Science 2025-01-01
Series:International Journal of Data and Network Science
Online Access:https://www.growingscience.com/ijds/Vol9/ijdns_2024_152.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839622730981310464
author Samar Al-Saqqa
Arafat Awajan
Bassam Hammo
author_facet Samar Al-Saqqa
Arafat Awajan
Bassam Hammo
author_sort Samar Al-Saqqa
collection DOAJ
description In recent years, opinions and communication can be easily expressed through social media networks that have allowed users to communicate and share their opinions and views, resulting in massive user-generated content. This content may contain text that is hateful to large groups or specific individuals. Therefore, in most website policies, automatic hate speech detection is required, and early automatic detection or filtering of such content is critical and necessary in online social networks, especially with large and increasingly user-generated content. This paper presents a suggested model to enhance the detection performance of hate speech using deep learning models with two types of word embedding models, the first model is Arabic models based on Wor2Vec including AraVec and Mazajak. The second is word embedding techniques models based on BERT including three pre-trained models namely ARABERT, MARBERT and CAMeLBERT. Common metrics in text classification are used including precision, recall, accuracy, and F1 score for model assessment. The experimental results show fine-tuned Arabic BERT models outperform Word2Vec based models, and that MARBERT outperforms both ARABERT and CAMeLBERT across all deep learning architectures, highlighting its superior ability to classify Arabic text. Additionally, BLSTM models show the highest performance on ARABERT, MARBERT, and CAMeLBERT, achieving an accuracy of 0.9945 with MARBERT.
format Article
id doaj-art-c1c91be081c342e9a40a7f9c63ffb1b3
institution Matheson Library
issn 2561-8148
2561-8156
language English
publishDate 2025-01-01
publisher Growing Science
record_format Article
series International Journal of Data and Network Science
spelling doaj-art-c1c91be081c342e9a40a7f9c63ffb1b32025-07-21T21:11:11ZengGrowing ScienceInternational Journal of Data and Network Science2561-81482561-81562025-01-019358760010.5267/j.ijdns.2024.8.008Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddingsSamar Al-SaqqaArafat AwajanBassam Hammo In recent years, opinions and communication can be easily expressed through social media networks that have allowed users to communicate and share their opinions and views, resulting in massive user-generated content. This content may contain text that is hateful to large groups or specific individuals. Therefore, in most website policies, automatic hate speech detection is required, and early automatic detection or filtering of such content is critical and necessary in online social networks, especially with large and increasingly user-generated content. This paper presents a suggested model to enhance the detection performance of hate speech using deep learning models with two types of word embedding models, the first model is Arabic models based on Wor2Vec including AraVec and Mazajak. The second is word embedding techniques models based on BERT including three pre-trained models namely ARABERT, MARBERT and CAMeLBERT. Common metrics in text classification are used including precision, recall, accuracy, and F1 score for model assessment. The experimental results show fine-tuned Arabic BERT models outperform Word2Vec based models, and that MARBERT outperforms both ARABERT and CAMeLBERT across all deep learning architectures, highlighting its superior ability to classify Arabic text. Additionally, BLSTM models show the highest performance on ARABERT, MARBERT, and CAMeLBERT, achieving an accuracy of 0.9945 with MARBERT.https://www.growingscience.com/ijds/Vol9/ijdns_2024_152.pdf
spellingShingle Samar Al-Saqqa
Arafat Awajan
Bassam Hammo
Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddings
International Journal of Data and Network Science
title Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddings
title_full Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddings
title_fullStr Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddings
title_full_unstemmed Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddings
title_short Hate speech detection in Arabic social networks using deep learning and fine-tuned embeddings
title_sort hate speech detection in arabic social networks using deep learning and fine tuned embeddings
url https://www.growingscience.com/ijds/Vol9/ijdns_2024_152.pdf
work_keys_str_mv AT samaralsaqqa hatespeechdetectioninarabicsocialnetworksusingdeeplearningandfinetunedembeddings
AT arafatawajan hatespeechdetectioninarabicsocialnetworksusingdeeplearningandfinetunedembeddings
AT bassamhammo hatespeechdetectioninarabicsocialnetworksusingdeeplearningandfinetunedembeddings