Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification

During natural disasters, social media platforms, such as X (formerly Twitter), become a valuable source of real-time information, with eyewitnesses and affected individuals posting messages about the produced damage and the victims. Although this information can be used to streamline the interventi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Iustin Sîrbu, Robert-Adrian Popovici, Traian Rebedea, Ștefan Trăușan-Matu
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Information
Subjects:	semi-supervised learning disaster tweet classification co-training machine learning multimodal learning
Online Access:	https://www.mdpi.com/2078-2489/16/6/434
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	During natural disasters, social media platforms, such as X (formerly Twitter), become a valuable source of real-time information, with eyewitnesses and affected individuals posting messages about the produced damage and the victims. Although this information can be used to streamline the intervention process of local authorities and to achieve a better distribution of available resources, manually annotating these messages is often infeasible due to time and cost constraints. To address this challenge, we explore the use of semi-supervised learning, a technique that leverages both labeled and unlabeled data, to enhance neural models for disaster tweet classification. Specifically, we investigate state-of-the-art semi-supervised learning models and focus on co-training, a less-explored approach in recent years. Moreover, we propose a novel hybrid co-training architecture, Multihead Average Pseudo-Margin, which obtains state-of-the-art results on several classification tasks. Our approach extends the advantages of the voting mechanism from Multihead Co-Training by using the Average Pseudo-Margin (APM) score to improve the quality of the pseudo-labels and self-adaptive confidence thresholds for improving imbalanced classification. Our method achieves up to 7.98% accuracy improvement in low-data scenarios and 2.84% improvement when using the entire labeled dataset, reaching 89.55% accuracy on the Humanitarian task and 91.23% on the Informative task. These results demonstrate the potential of our approach in addressing the critical need for automated disaster tweet classification. We made our code publicly available for future research.
ISSN:	2078-2489

Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification

Similar Items