Optimized Quran Passage Retrieval Using an Expanded QA Dataset and Fine-Tuned Language Models

Loading...
Thumbnail Image

Date

2025-06-26

Journal Title

Journal ISSN

Volume Title

Type

Book chapter

Publisher

Springer Science and Business Media Deutschland GmbH

Series Info

Lecture Notes on Data Engineering and Communications Technologies ; Volume 255 , Pages 244 - 254

Scientific Journal Rankings

Abstract

Understanding the deep meanings of the Qur’an and the bridge the language gap between modern standard Arabic and classical Arabic is essential to improve the question-and-answer system for the Holy Qur’an. The Qur’an QA 2023 shared task dataset had limited number of questions with weak model retrieval. To address this challenge, this work was done to update the original dataset and improve the model accuracy. The original dataset which contains 251 questions was reviewed and expanded to 629 questions with questions diversification and reformulation, leading to a comprehensive set of 1895 categorized into single-answer, multi-answer, and zero-answer types. Extensive experiments fine-tuned transformer models, including AraBERT, RoBERTa, CAMeLBERT, AraELECTRA, and BERT. The paper best model, AraBERT-base, achieved a MAP@10 of 0.36 and MRR of 0.59, representing improvements of 63% and 59%, respectively, compared to the baseline scores (MAP@10: 0.22, MRR: 0.37). Additionally, the dataset expansion led to improvements in handling” no answer” cases, with the proposed approach achieving a 75% success rate for such instances, compared to the baseline’s 25%. These results demonstrate the effect of dataset improvement and model architecture optimization in increasing the performance of QA systems for Holy Qur’an, with higher accuracy, recall, and precision.

Description

Keywords

Quran Question Answering, Passage Retrieval, Modern Standard Arabic

Citation

Basem, M., Oshallah, I., Hikal, B., Hamdi, A., & Mohamed, A. (2025). Optimized Quran passage retrieval using an expanded QA dataset and Fine-Tuned language models. In Lecture notes on data engineering and communications technologies (pp. 244–254). https://doi.org/10.1007/978-3-031-91354-9_20