A novel approach for mitigating class imbalance in Arabic text classification
| dc.Affiliation | October University for modern sciences and Arts MSA | |
| dc.contributor.author | EMAD NABIL | |
| dc.contributor.author | ABDELRAHMAN E. NAGIB | |
| dc.contributor.author | MENA HANY | |
| dc.contributor.author | SAFIULLAH FAIZULLAH | |
| dc.contributor.author | WAEL HASSAN GOMAA | |
| dc.date.accessioned | 2025-09-20T12:05:48Z | |
| dc.date.issued | 2025-09-01 | |
| dc.description | SJR 2024 0.849 Q1 H-Index 290 | |
| dc.description.abstract | Natural language processing (NLP) has become somewhat well-known because of its many uses; deep neural networks have driven major developments. Still, there are difficulties, especially in Arabic NLP, where the language’s large vocabulary of over 12 million words and several dialects cause special issues. Arabic has a large speaker base; however, NLP studies in this language find challenges, particularly with class imbalance. Many times, standard class balancing methods overlook intra-class similarity, a crucial element influencing model training. We present a new approach for computing intra-class similarity using cosine similarity and embedding models to find ideal class weights for model training, hence bridging this difference. On two benchmark datasets—the Arabic Semantic Question Similarity dataset (NSURL) and the Microsoft Research Paragraph Corpus (MRPC)—we assessed the proposed approach. With an accuracy of state-of-the-art 83.25% on the MRPC dataset and 96.931% on the NSURL dataset, the proposed approach proved successful in improving model performance in Arabic text classification. | |
| dc.description.uri | https://www.scimagojr.com/journalsearch.php?q=21100374601&tip=sid&clean=0 | |
| dc.identifier.citation | Nabil, E., Nagib, A. E., Hany, M., Faizullah, S., & Gomaa, W. H. (2025). A novel approach for mitigating class imbalance in Arabic text classification. IEEE Access, 1. https://doi.org/10.1109/access.2025.3604427 | |
| dc.identifier.doi | https://doi.org/10.1109/access.2025.3604427 | |
| dc.identifier.other | https://doi.org/10.1109/access.2025.3604427 | |
| dc.identifier.uri | https://repository.msa.edu.eg/handle/123456789/6522 | |
| dc.language.iso | en_US | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc | |
| dc.relation.ispartofseries | IEEE Access ; Volume 13 , Pages 152870 - 152889 | |
| dc.subject | class weights | |
| dc.subject | classification | |
| dc.subject | intra-class similarity | |
| dc.subject | Natural language processing | |
| dc.subject | semantic similarity | |
| dc.subject | Transformers | |
| dc.title | A novel approach for mitigating class imbalance in Arabic text classification | |
| dc.type | Article |
