World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Multi-Modal Fusion Sign Language Recognition Based on Residual Network and Attention Mechanism

    https://doi.org/10.1142/S0218001422500367Cited by:4 (Source: Crossref)

    Sign language recognition (SLR) is a useful tool for the deaf-mute to communicate with the outside world. Although many SLR methods have been proposed and have demonstrated good performance, continuous SLR (CSLR) is still challenging. Meanwhile, due to the heavy occlusions and closely interacting motions, there is a higher requirement for the real-time efficiency of CSLR. Therefore, the performance of CSLR needs further improvement. The highlights include: (1) to overcome these challenges, this paper proposes a novel video-based CSLR framework. This framework consists of three components: an OpenPose-based skeleton stream extraction module, a RGB stream extraction module, and a combination module of the BiLSTM network and the conditional hidden Markov model (CHMM) for CSLR. (2) A new residual network with Squeeze-and-Excitation blocks (SEResNet50) for video sequence feature extraction. (3) This paper combines the SEResNet50 module with the BiLSTM network to extract the feature information from video streams with different modalities. To evaluate the effectiveness of our proposed framework, experiments are conducted on two CSL datasets. The experimental results indicate that our method is superior to the methods in the literature.