Recognizing Sign Language Gestures Using a Hybrid Spatio-Temporal Deep Learning Model
- 1 Department of Computer Science, Faculty of Sciences Dhar-Mahraz, University Sidi Mohamed Ben Abdellah, Fez, Morocco
Abstract
Recognizing gestures in American Sign Language (ASL) from video data presents significant challenges due to the intricate combination of hand gestures, facial cues, and body motion. In this work, we introduce a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs) for extracting spatial characteristics with Long Short-Term Memory (LSTM) networks for capturing temporal sequences. The model was trained and evaluated on a subset of 25 classes from the WLASL dataset, a comprehensive video collection comprising over 2,000 labeled ASL signs. Achieving an accuracy of 96%, the proposed system demonstrates superior performance compared to traditional methods. These findings underscore the strength of spatio-temporal modeling in sign language recognition. With a design geared toward scalability and real-time deployment, the approach shows strong potential to support communication and accessibility for individuals with hearing impairments. Future developments will aim to mitigate class imbalance, broaden applicability to other sign languages, and assess the benefits of Transformer-based models for enhanced recognition.
DOI: https://doi.org/10.3844/jcssp.2025.2965.2974
Copyright: © 2025 Meryem Cherrate, My Abdelouahed Sabri, Ali Yahyaouy and Abdellah Aarab. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 36 Views
- 5 Downloads
- 0 Citations
Download
Keywords
- American Sign Language
- Gesture Recognition
- WLASL Dataset
- Deep Learning
- Communication
- Assistive Technology