A deep learning approach for strengthening person identification in face-based authentication systems using visual speech recognition

Vishnu Chandrabanshi; S. Domnic

doi:10.1140/epjs/s11734-025-01586-z

2024 Impact factor 2.3

Special Topics

Eur. Phys. J. Spec. Top.
https://doi.org/10.1140/epjs/s11734-025-01586-z

Regular Article

A deep learning approach for strengthening person identification in face-based authentication systems using visual speech recognition

Vishnu Chandrabanshi^a and S. Domnic

Department of Computer Applications, National Institute of Technology, 620015, Tiruchirappalli, Tamil Nadu, India

^a vishnukumar.nit@gmail.com

Received: 23 January 2025
Accepted: 14 March 2025
Published online: 25 March 2025

Abstract

Identity verification is essential in both an individual’s personal and professional life. It confirms a person’s identity for various services and establishes their legitimacy as an employee within an organization. As cybercrime evolves and becomes more sophisticated, ensuring robust, and secure personal authentication methods has become a critical challenge. Existing face-based authentication systems typically employ deep learning models for user verification. However, these systems are susceptible to various attacks, such as presentation attacks, 3D mask attacks, and adversarial attacks that exploit and deceive the models by manipulating digital representations of human faces. Although various liveness detection techniques have been proposed to combat face spoofing in face-based authentication systems. However, these systems remain vulnerable and can be exploited by sophisticated techniques. To counteract face spoofing in a face-based authentication system, we have proposed an advanced liveness detection technique using Visual Speech Recognition (VSR). The proposed VSR model is designed to integrate seamlessly with face-based authentication systems, forming a dual authentication framework for enhanced liveness detection. The VSR model decodes silently pronounced speech from video by analyzing unique, unforgeable lip motion patterns into textual representation. Although, various liveness detection techniques have been proposed to combat face spoofing in face-based authentication systems. However, these systems remain vulnerable and can be exploited by sophisticated techniques. To counteract face spoofing in a face-based authentication system, we have proposed an advanced liveness detection technique using VSR. The proposed VSR model is designed to integrate seamlessly with face-based authentication systems, forming a dual authentication framework for enhanced liveness detection. The VSR model decodes silently pronounced speech from video by analyzing unique, unforgeable lip motion patterns into textual representation. To achieve effective liveness detection using VSR, we need to enhance the accuracy of the VSR system. The proposed work employs an encoder-decoder technique to extract more robust features from lip motion. The encoder employs a three-dimensional convolution neural network (3D-CNN) combined with a fusion of bi-directional gated recurrent units and long short-term memory (BiGRU-BiLSTM) to effectively capture spatial-temporal patterns from lip movement. The decoder integrates Multi-Head Attention (MHA) with BiGRU-BiLSTM to effectively focus on relevant features and enhance contextual understanding for more accurate text prediction. The proposed VSR system achieved a word error rate (WER) of 0.79%, demonstrating a significant reduction in error rate and outperforming compared to the existing VSR models.

Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Conference announcements

12 Internat. Congress of the Balkan Physical Union
July 8-12, 2025
Bucharest, Romania

Joint Annual Meeting of ÖPG and SPS
August 18-22, 2025
Wien, Austria

111th Italian National Society Congress
September 22-26, 2025
Palermo, Italy

EPJ

A deep learning approach for strengthening person identification in face-based authentication systems using visual speech recognition

Conference announcements