Fusing CNNs and attention-mechanisms to improve real-time indoor Human Activity Recognition for classifying home-based physical rehabilitation exercises

Loading...
Thumbnail Image

Date

2025-01-01

Journal Title

Journal ISSN

Volume Title

Type

Article

Publisher

Elsevier Ltd

Series Info

Computers in Biology and Medicine ; 184 (2025) 109399

Abstract

Physical rehabilitation plays a critical role in enhancing health outcomes globally. However, the shortage of physiotherapists, particularly in developing countries where the ratio is approximately ten physiotherapists per million people, poses a significant challenge to effective rehabilitation services. The existing literature on rehabilitation often falls short in data representation and the employment of diverse modalities, limiting the potential for advanced therapeutic interventions. To address this gap, This study integrates Computer Vision and Human Activity Recognition (HAR) technologies to support home-based rehabilitation. The study mitigates this gap by exploring various modalities and proposing a framework for data representation. We introduce a novel framework that leverages both Continuous Wavelet Transform (CWT) and Mel-Frequency Cepstral Coefficients (MFCC) for skeletal data representation. CWT is particularly valuable for capturing the time-frequency characteristics of dynamic movements involved in rehabilitation exercises, enabling a comprehensive depiction of both temporal and spectral features. This dual capability is crucial for accurately modelling the complex and variable nature of rehabilitation exercises. In our analysis, we evaluate 20 CNNbased models and one Vision Transformer (ViT) model. Additionally, we propose 12 hybrid architectures that combine CNN-based models with ViT in bi-model and tri-model configurations. These models are rigorously tested on the UI-PRMD and KIMORE benchmark datasets using key evaluation metrics, including accuracy, precision, recall, and F1-score, with 5-fold cross-validation. Our evaluation also considers realtime performance, model size, and efficiency on low-power devices, emphasising practical applicability. The proposed fused tri-model architectures outperform both single-architectures and bi-model configurations, demonstrating robust performance across both datasets and making the fused models the preferred choice for rehabilitation tasks. Our proposed hybrid model, DenMobVit, consistently surpasses state-of-the-art methods, achieving accuracy improvements of 2.9% and 1.97% on the UI-PRMD and KIMORE datasets, respectively. These findings highlight the effectiveness of our approach in advancing rehabilitation technologies and bridging the gap in physiotherapy services.

Description

Keywords

Physical rehabilitation , Deep learning , Transfer learning , Vision Transformer (ViT) , Model fusion , Continuous Wavelet Transform (CWT) , Mel-Frequency Cepstral Coefficients (MFCC)

Citation

Zaher, M., Ghoneim, A. S., Abdelhamid, L., & Atia, A. (2024). Fusing CNNs and attention-mechanisms to improve real-time indoor Human Activity Recognition for classifying home-based physical rehabilitation exercises. Computers in Biology and Medicine, 184, 109399. https://doi.org/10.1016/j.compbiomed.2024.109399