Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection

Menna Elgabry; Ali Hamdi

doi:https://doi.org/10.1007/978-3-032-23035-5_16

Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection

dc.Affiliation	October University for modern sciences and Arts MSA
dc.contributor.author	Menna Elgabry
dc.contributor.author	Ali Hamdi
dc.date.accessioned	2026-06-02T07:21:43Z
dc.date.issued	2026-05-01
dc.description	SJR 2025 0.119 Q4 H-Index 40 Subject Area and Category: Computer Science Computer Networks and Communications Computer Science Applications Information Systems Engineering Electrical and Electronic Engineering Media Technology
dc.description.abstract	This paper introduces a confidence-weighted, credibility-aware ensemble framework for text-based emotion detection, inspired by Condorcet’s Jury Theorem (CJT). Unlike conventional ensembles that often rely on homogeneous architectures, our approach combines architecturally diverse small transformer-based large language models (sLLMs)—BERT, RoBERTa, DistilBERT, DeBERTa, and ELECTRA—each fully fine-tuned for emotion classification. To preserve error diversity, we minimize parameter convergence while taking advantage of the unique biases of each model. A dual-weighted voting mechanism integrates both global credibility (validation F1- score) and local confidence (instance-level probability) to dynamically weight model contributions. Experiments on the DAIR-AI dataset demonstrate that our credibility-confidence ensemble achieves a macro F1-score of 93.5%, surpassing state-of-the-art benchmarks and significantly outperforming large-scale LLMs, including Falcon, Mistral, Qwen, and Phi, even after task-specific Low-Rank Adaptation (LoRA). With only 595 M parameters in total, our small LLMs ensemble proves more parameter-efficient and robust than models up to 7B parameters, establishing that carefully designed ensembles of small, fine-tuned models can outperform much larger LLMs in specialized natural language processing (NLP) tasks such as emotion detection.
dc.description.uri	https://www.scimagojr.com/journalsearch.php?q=21100975545&tip=sid&clean=0
dc.identifier.citation	Elgabry, M., & Hamdi, A. (2026). Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection. Lecture Notes on Data Engineering and Communications Technologies, 170–179. https://doi.org/10.1007/978-3-032-23035-5_16 ‌
dc.identifier.doi	https://doi.org/10.1007/978-3-032-23035-5_16
dc.identifier.other	https://doi.org/10.1007/978-3-032-23035-5_16
dc.identifier.uri	https://repository.msa.edu.eg/handle/123456789/6775
dc.language.iso	en_US
dc.publisher	Springer International Publishing AG
dc.relation.ispartofseries	Lecture Notes on Data Engineering and Communications Technologies ; Volume 293 , Pages 170 - 179
dc.subject	Condorcet’s jury theorem
dc.subject	Emotion detection
dc.subject	Ensemble learning
dc.subject	Small LLMs
dc.subject	Weighted voting
dc.title	Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection
dc.type	Book chapter

Files

Original bundle

Now showing 1 - 1 of 1

Name:: IMG-20231214-WA0000.jpg
Size:: 16.8 KB
Format:: Joint Photographic Experts Group/JPEG File Interchange Format (JFIF)

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 51 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Computer Science Research Paper