Fusing CNNs and attention-mechanisms to improve real-time indoor Human Activity Recognition for classifying home-based physical rehabilitation exercises
Loading...
Date
2025-01-01
Journal Title
Journal ISSN
Volume Title
Type
Article
Publisher
Elsevier Ltd
Series Info
Computers in Biology and Medicine ; 184 (2025) 109399
Scientific Journal Rankings
Abstract
Physical rehabilitation plays a critical role in enhancing health outcomes globally. However, the shortage of
physiotherapists, particularly in developing countries where the ratio is approximately ten physiotherapists
per million people, poses a significant challenge to effective rehabilitation services. The existing literature
on rehabilitation often falls short in data representation and the employment of diverse modalities, limiting
the potential for advanced therapeutic interventions. To address this gap, This study integrates Computer
Vision and Human Activity Recognition (HAR) technologies to support home-based rehabilitation. The study
mitigates this gap by exploring various modalities and proposing a framework for data representation. We
introduce a novel framework that leverages both Continuous Wavelet Transform (CWT) and Mel-Frequency
Cepstral Coefficients (MFCC) for skeletal data representation. CWT is particularly valuable for capturing
the time-frequency characteristics of dynamic movements involved in rehabilitation exercises, enabling a
comprehensive depiction of both temporal and spectral features. This dual capability is crucial for accurately
modelling the complex and variable nature of rehabilitation exercises. In our analysis, we evaluate 20 CNNbased models and one Vision Transformer (ViT) model. Additionally, we propose 12 hybrid architectures
that combine CNN-based models with ViT in bi-model and tri-model configurations. These models are
rigorously tested on the UI-PRMD and KIMORE benchmark datasets using key evaluation metrics, including
accuracy, precision, recall, and F1-score, with 5-fold cross-validation. Our evaluation also considers realtime performance, model size, and efficiency on low-power devices, emphasising practical applicability.
The proposed fused tri-model architectures outperform both single-architectures and bi-model configurations,
demonstrating robust performance across both datasets and making the fused models the preferred choice for
rehabilitation tasks. Our proposed hybrid model, DenMobVit, consistently surpasses state-of-the-art methods,
achieving accuracy improvements of 2.9% and 1.97% on the UI-PRMD and KIMORE datasets, respectively.
These findings highlight the effectiveness of our approach in advancing rehabilitation technologies and bridging
the gap in physiotherapy services.
Description
Keywords
Physical rehabilitation , Deep learning , Transfer learning , Vision Transformer (ViT) , Model fusion , Continuous Wavelet Transform (CWT) , Mel-Frequency Cepstral Coefficients (MFCC)
Citation
Zaher, M., Ghoneim, A. S., Abdelhamid, L., & Atia, A. (2024). Fusing CNNs and attention-mechanisms to improve real-time indoor Human Activity Recognition for classifying home-based physical rehabilitation exercises. Computers in Biology and Medicine, 184, 109399. https://doi.org/10.1016/j.compbiomed.2024.109399