Multimodal Rumor Detection by Online Balance Multimodal Representation Learning

Multimodal approaches have been theoretically and empirically shown to outperform unimodal methods. Paradoxically, leading unimodal architectures sometimes surpass multimodal systems trained in a joint framework. Previous studies have shown that this counterintuitive outcome stems from the disparate...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jianing Ren, Tingting Zhong
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	CLIP adaptation dynamic prompt pool vision-language alignment sentiment clusters fine-tuning transformers
Online Access:	https://ieeexplore.ieee.org/document/11072677/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multimodal approaches have been theoretically and empirically shown to outperform unimodal methods. Paradoxically, leading unimodal architectures sometimes surpass multimodal systems trained in a joint framework. Previous studies have shown that this counterintuitive outcome stems from the disparate convergence speeds and generalization capabilities inherent to different modalities. In joint training, this imbalance can lead to the over-representation of one modality while others remain underfit. To overcome this limitation in multimodal rumor detection, we employ a training strategy that dynamically scales the logits of each modality. By applying adaptive coefficients, our method normalizes the output magnitudes to better align with the target values, thereby ensuring a more balanced contribution from all modalities. Extensive evaluations on two real-world multimodal datasets confirm that our approach stabilizes the training process and yields an embedding space that effectively discriminates between rumors and verified information.
ISSN:	2169-3536

Multimodal Rumor Detection by Online Balance Multimodal Representation Learning

Similar Items