Optimized Quran Passage Retrieval Using an Expanded QA Dataset and Fine-Tuned Language Models
Date
2025-06-26
Journal Title
Journal ISSN
Volume Title
Type
Book chapter
Publisher
Springer Science and Business Media Deutschland GmbH
Series Info
Lecture Notes on Data Engineering and Communications Technologies ; Volume 255 , Pages 244 - 254
Scientific Journal Rankings
Abstract
Understanding the deep meanings of the Qur’an and the
bridge the language gap between modern standard Arabic and classical Arabic is essential to improve the question-and-answer system for
the Holy Qur’an. The Qur’an QA 2023 shared task dataset had limited
number of questions with weak model retrieval. To address this challenge, this work was done to update the original dataset and improve
the model accuracy. The original dataset which contains 251 questions
was reviewed and expanded to 629 questions with questions diversification and reformulation, leading to a comprehensive set of 1895 categorized into single-answer, multi-answer, and zero-answer types. Extensive experiments fine-tuned transformer models, including AraBERT,
RoBERTa, CAMeLBERT, AraELECTRA, and BERT. The paper best
model, AraBERT-base, achieved a MAP@10 of 0.36 and MRR of
0.59, representing improvements of 63% and 59%, respectively, compared
to the baseline scores (MAP@10: 0.22, MRR: 0.37). Additionally, the
dataset expansion led to improvements in handling” no answer” cases,
with the proposed approach achieving a 75% success rate for such
instances, compared to the baseline’s 25%. These results demonstrate
the effect of dataset improvement and model architecture optimization in
increasing the performance of QA systems for Holy Qur’an, with higher
accuracy, recall, and precision.
Description
Keywords
Quran Question Answering, Passage Retrieval, Modern Standard Arabic
Citation
Basem, M., Oshallah, I., Hikal, B., Hamdi, A., & Mohamed, A. (2025). Optimized Quran passage retrieval using an expanded QA dataset and Fine-Tuned language models. In Lecture notes on data engineering and communications technologies (pp. 244–254). https://doi.org/10.1007/978-3-031-91354-9_20