Interpreting Multimodal Fake News Detection Models: An Experimental Study of Performance Factors and Modality Contributions

Noha A. Saad Eldien; Wael H. Gomaa; Khaled T. Wassif; Hanaa Bayomi

doi:https://doi.org/10.14569/ijacsa.2026.0170186

Interpreting Multimodal Fake News Detection Models: An Experimental Study of Performance Factors and Modality Contributions

dc.Affiliation	October University for modern sciences and Arts MSA
dc.contributor.author	Noha A. Saad Eldien
dc.contributor.author	Wael H. Gomaa
dc.contributor.author	Khaled T. Wassif
dc.contributor.author	Hanaa Bayomi
dc.date.accessioned	2026-03-06T21:21:15Z
dc.date.issued	2026-02-26
dc.description	SJR 2024 0.285 Q3 H-Index 58 Subject Area and Category: Computer Science Computer Science (miscellaneous)
dc.description.abstract	The widespread dissemination of multimodal misinformation requires models that can reason across textual and visual content while remaining interpretable. However, many existing multimodal fusion approaches implicitly assume uniform modality reliability, providing limited transparency into modality contributions. This study introduces TweFuse-W, a lightweight multimodal framework for fine-grained fake-news detection that reframes multimodal fusion as a modality reliability estimation problem, rather than merely merging modalities or explicitly modeling their interactions. TweFuse-W integrates BERTweetbased textual representations with Swin Transformer visual features using a sample-conditioned, learnable weighted-sum gate operating at the modality level, producing global reliability weights without cross-attention overhead. By explicitly parameterizing modality contributions during inference, the proposed approach provides intrinsic interpretability. Experiments on the six-class Fakeddit dataset show that TweFuse-W achieves a macro-F1 score of 0.838, outperforming simple concatenation (macro-F1 = 0.820). Analysis of the learned modality weights confirms meaningful interpretability, with textual representations dominating in Satire, Misleading, False Connection, and Imposter Content (αT = 0.57–0.62), while visual cues exert greater influence in Manipulated Content (αV = 0.51). Overall, these findings demonstrate that adaptive modality weighting enhances both predictive performance and model transparency, serving as a lightweight and interpretable complementary fusion strategy for multimodal fake-news detection.
dc.description.uri	https://www.scimagojr.com/journalsearch.php?q=21100867241&tip=sid&clean=0
dc.identifier.citation	Eldien, N. A. S., Gomaa, W. H., Wassif, K. T., & Bayomi, H. (2026). Interpreting Multimodal Fake News Detection Models: An Experimental Study of Performance Factors and Modality Contributions. International Journal of Advanced Computer Science and Applications, 17(1). https://doi.org/10.14569/ijacsa.2026.0170186 ‌
dc.identifier.doi	https://doi.org/10.14569/ijacsa.2026.0170186
dc.identifier.other	https://doi.org/10.14569/ijacsa.2026.0170186
dc.identifier.uri	https://repository.msa.edu.eg/handle/123456789/6655
dc.language.iso	en_US
dc.publisher	Science and Information Organization
dc.relation.ispartofseries	International Journal of Advanced Computer Science and Applications ; Volume 17 , Issue 1 , Pages 886 - 895
dc.subject	adaptive fusion
dc.subject	interpretable fusion
dc.subject	lightweight multimodal models
dc.subject	modality reliability modeling
dc.subject	Multimodal fake news detection
dc.title	Interpreting Multimodal Fake News Detection Models: An Experimental Study of Performance Factors and Modality Contributions
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Paper_86-Interpreting_Multimodal_Fake_News_Detection_Models.pdf
Size:: 688.07 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 51 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Computer Science Research Paper