A novel approach for mitigating class imbalance in Arabic text classification

dc.AffiliationOctober University for modern sciences and Arts MSA
dc.contributor.authorEMAD NABIL
dc.contributor.authorABDELRAHMAN E. NAGIB
dc.contributor.authorMENA HANY
dc.contributor.authorSAFIULLAH FAIZULLAH
dc.contributor.authorWAEL HASSAN GOMAA
dc.date.accessioned2025-09-20T12:05:48Z
dc.date.issued2025-09-01
dc.descriptionSJR 2024 0.849 Q1 H-Index 290
dc.description.abstractNatural language processing (NLP) has become somewhat well-known because of its many uses; deep neural networks have driven major developments. Still, there are difficulties, especially in Arabic NLP, where the language’s large vocabulary of over 12 million words and several dialects cause special issues. Arabic has a large speaker base; however, NLP studies in this language find challenges, particularly with class imbalance. Many times, standard class balancing methods overlook intra-class similarity, a crucial element influencing model training. We present a new approach for computing intra-class similarity using cosine similarity and embedding models to find ideal class weights for model training, hence bridging this difference. On two benchmark datasets—the Arabic Semantic Question Similarity dataset (NSURL) and the Microsoft Research Paragraph Corpus (MRPC)—we assessed the proposed approach. With an accuracy of state-of-the-art 83.25% on the MRPC dataset and 96.931% on the NSURL dataset, the proposed approach proved successful in improving model performance in Arabic text classification.
dc.description.urihttps://www.scimagojr.com/journalsearch.php?q=21100374601&tip=sid&clean=0
dc.identifier.citationNabil, E., Nagib, A. E., Hany, M., Faizullah, S., & Gomaa, W. H. (2025). A novel approach for mitigating class imbalance in Arabic text classification. IEEE Access, 1. https://doi.org/10.1109/access.2025.3604427
dc.identifier.doihttps://doi.org/10.1109/access.2025.3604427
dc.identifier.otherhttps://doi.org/10.1109/access.2025.3604427
dc.identifier.urihttps://repository.msa.edu.eg/handle/123456789/6522
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc
dc.relation.ispartofseriesIEEE Access ; Volume 13 , Pages 152870 - 152889
dc.subjectclass weights
dc.subjectclassification
dc.subjectintra-class similarity
dc.subjectNatural language processing
dc.subjectsemantic similarity
dc.subjectTransformers
dc.titleA novel approach for mitigating class imbalance in Arabic text classification
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
A_novel_approach_for_mitigating_class_imbalance_in.pdf
Size:
3.48 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
51 B
Format:
Item-specific license agreed upon to submission
Description: