Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection
| dc.Affiliation | October University for modern sciences and Arts MSA | |
| dc.contributor.author | Menna Elgabry | |
| dc.contributor.author | Ali Hamdi | |
| dc.date.accessioned | 2026-06-02T07:21:43Z | |
| dc.date.issued | 2026-05-01 | |
| dc.description | SJR 2025 0.119 Q4 H-Index 40 Subject Area and Category: Computer Science Computer Networks and Communications Computer Science Applications Information Systems Engineering Electrical and Electronic Engineering Media Technology | |
| dc.description.abstract | This paper introduces a confidence-weighted, credibility-aware ensemble framework for text-based emotion detection, inspired by Condorcet’s Jury Theorem (CJT). Unlike conventional ensembles that often rely on homogeneous architectures, our approach combines architecturally diverse small transformer-based large language models (sLLMs)—BERT, RoBERTa, DistilBERT, DeBERTa, and ELECTRA—each fully fine-tuned for emotion classification. To preserve error diversity, we minimize parameter convergence while taking advantage of the unique biases of each model. A dual-weighted voting mechanism integrates both global credibility (validation F1- score) and local confidence (instance-level probability) to dynamically weight model contributions. Experiments on the DAIR-AI dataset demonstrate that our credibility-confidence ensemble achieves a macro F1-score of 93.5%, surpassing state-of-the-art benchmarks and significantly outperforming large-scale LLMs, including Falcon, Mistral, Qwen, and Phi, even after task-specific Low-Rank Adaptation (LoRA). With only 595 M parameters in total, our small LLMs ensemble proves more parameter-efficient and robust than models up to 7B parameters, establishing that carefully designed ensembles of small, fine-tuned models can outperform much larger LLMs in specialized natural language processing (NLP) tasks such as emotion detection. | |
| dc.description.uri | https://www.scimagojr.com/journalsearch.php?q=21100975545&tip=sid&clean=0 | |
| dc.identifier.citation | Elgabry, M., & Hamdi, A. (2026). Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection. Lecture Notes on Data Engineering and Communications Technologies, 170–179. https://doi.org/10.1007/978-3-032-23035-5_16 | |
| dc.identifier.doi | https://doi.org/10.1007/978-3-032-23035-5_16 | |
| dc.identifier.other | https://doi.org/10.1007/978-3-032-23035-5_16 | |
| dc.identifier.uri | https://repository.msa.edu.eg/handle/123456789/6775 | |
| dc.language.iso | en_US | |
| dc.publisher | Springer International Publishing AG | |
| dc.relation.ispartofseries | Lecture Notes on Data Engineering and Communications Technologies ; Volume 293 , Pages 170 - 179 | |
| dc.subject | Condorcet’s jury theorem | |
| dc.subject | Emotion detection | |
| dc.subject | Ensemble learning | |
| dc.subject | Small LLMs | |
| dc.subject | Weighted voting | |
| dc.title | Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection | |
| dc.type | Book chapter |
