Optimized Quran Passage Retrieval Using an Expanded QA Dataset and Fine-Tuned Language Models

dc.AffiliationOctober University for modern sciences and Arts MSA
dc.contributor.authorMohamed Basem
dc.contributor.authorIslam Oshallah
dc.contributor.authorBaraa Hikal
dc.contributor.authorAli Hamdi
dc.contributor.authorAmmar Mohamed
dc.date.accessioned2025-07-17T08:42:48Z
dc.date.available2025-07-17T08:42:48Z
dc.date.issued2025-06-26
dc.description.abstractUnderstanding the deep meanings of the Qur’an and the bridge the language gap between modern standard Arabic and classical Arabic is essential to improve the question-and-answer system for the Holy Qur’an. The Qur’an QA 2023 shared task dataset had limited number of questions with weak model retrieval. To address this challenge, this work was done to update the original dataset and improve the model accuracy. The original dataset which contains 251 questions was reviewed and expanded to 629 questions with questions diversification and reformulation, leading to a comprehensive set of 1895 categorized into single-answer, multi-answer, and zero-answer types. Extensive experiments fine-tuned transformer models, including AraBERT, RoBERTa, CAMeLBERT, AraELECTRA, and BERT. The paper best model, AraBERT-base, achieved a MAP@10 of 0.36 and MRR of 0.59, representing improvements of 63% and 59%, respectively, compared to the baseline scores (MAP@10: 0.22, MRR: 0.37). Additionally, the dataset expansion led to improvements in handling” no answer” cases, with the proposed approach achieving a 75% success rate for such instances, compared to the baseline’s 25%. These results demonstrate the effect of dataset improvement and model architecture optimization in increasing the performance of QA systems for Holy Qur’an, with higher accuracy, recall, and precision.
dc.identifier.citationBasem, M., Oshallah, I., Hikal, B., Hamdi, A., & Mohamed, A. (2025). Optimized Quran passage retrieval using an expanded QA dataset and Fine-Tuned language models. In Lecture notes on data engineering and communications technologies (pp. 244–254). https://doi.org/10.1007/978-3-031-91354-9_20
dc.identifier.doihttps://doi.org/10.1007/978-3-031-91354-9_20
dc.identifier.otherhttps://doi.org/10.1007/978-3-031-91354-9_20
dc.identifier.urihttps://repository.msa.edu.eg/handle/123456789/6472
dc.language.isoen_US
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.relation.ispartofseriesLecture Notes on Data Engineering and Communications Technologies ; Volume 255 , Pages 244 - 254
dc.subjectQuran Question Answering
dc.subjectPassage Retrieval
dc.subjectModern Standard Arabic
dc.titleOptimized Quran Passage Retrieval Using an Expanded QA Dataset and Fine-Tuned Language Models
dc.typeBook chapter

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2412.11431v1.pdf
Size:
402.5 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
51 B
Format:
Item-specific license agreed upon to submission
Description: