A novel approach for mitigating class imbalance in Arabic text classification

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc

Series Info

IEEE Access ; Volume 13 , Pages 152870 - 152889

Orcid

Abstract

Natural language processing (NLP) has become somewhat well-known because of its many uses; deep neural networks have driven major developments. Still, there are difficulties, especially in Arabic NLP, where the language’s large vocabulary of over 12 million words and several dialects cause special issues. Arabic has a large speaker base; however, NLP studies in this language find challenges, particularly with class imbalance. Many times, standard class balancing methods overlook intra-class similarity, a crucial element influencing model training. We present a new approach for computing intra-class similarity using cosine similarity and embedding models to find ideal class weights for model training, hence bridging this difference. On two benchmark datasets—the Arabic Semantic Question Similarity dataset (NSURL) and the Microsoft Research Paragraph Corpus (MRPC)—we assessed the proposed approach. With an accuracy of state-of-the-art 83.25% on the MRPC dataset and 96.931% on the NSURL dataset, the proposed approach proved successful in improving model performance in Arabic text classification.

Description

SJR 2024 0.849 Q1 H-Index 290

Citation

Nabil, E., Nagib, A. E., Hany, M., Faizullah, S., & Gomaa, W. H. (2025). A novel approach for mitigating class imbalance in Arabic text classification. IEEE Access, 1. https://doi.org/10.1109/access.2025.3604427

Endorsement

Review

Supplemented By

Referenced By