A novel approach for mitigating class imbalance in Arabic text classification
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc
Series Info
IEEE Access ; Volume 13 , Pages 152870 - 152889
Scientific Journal Rankings
Orcid
Abstract
Natural language processing (NLP) has become somewhat well-known because of its many uses; deep neural networks have driven major developments. Still, there are difficulties, especially in Arabic NLP, where the language’s large vocabulary of over 12 million words and several dialects cause special issues. Arabic has a large speaker base; however, NLP studies in this language find challenges, particularly with class imbalance. Many times, standard class balancing methods overlook intra-class similarity, a crucial element influencing model training. We present a new approach for computing intra-class similarity using cosine similarity and embedding models to find ideal class weights for model training, hence bridging this difference. On two benchmark datasets—the Arabic Semantic Question Similarity dataset (NSURL) and the Microsoft Research Paragraph Corpus (MRPC)—we assessed the proposed approach. With an accuracy of state-of-the-art 83.25% on the MRPC dataset and 96.931% on the NSURL dataset, the proposed approach proved successful in improving model performance in Arabic text classification.
Description
SJR 2024
0.849
Q1
H-Index
290
Citation
Nabil, E., Nagib, A. E., Hany, M., Faizullah, S., & Gomaa, W. H. (2025). A novel approach for mitigating class imbalance in Arabic text classification. IEEE Access, 1. https://doi.org/10.1109/access.2025.3604427
