Video representation and deep learning techniques for face presentation attack detection
Thesis event information
Date and time of the thesis defence
Place of the thesis defence
L10, Linnanmaa
Topic of the dissertation
Video representation and deep learning techniques for face presentation attack detection
Doctoral candidate
Master of Science Usman Muhammad
Faculty and unit
University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Center for Machine Vision and Signal Analysis
Subject of study
Computer Science
Opponent
Professor Moncef Gabbouj, Tampere University
Custos
Associate professor Mourad Oussalah, University of Oulu
Video representation and deep learning techniques for face presentation attack detection.
Facial recognition technology has rapidly gained popularity in a number of security applications, including airport passenger screening, cell phone screening, banking, and law enforcement surveillance. Unfortunately, recent studies show that facial recognition systems can be vulnerable to spoofing, known as a presentation attack. For instance, false facial verification using a photograph, silicone mask, video replay, or even a 3D mask can be used to fraudulently gain access to the biometric system. In recent years, significant efforts have been made to develop software or hardware-based methods, but their performance drastically degrades under real-world conditions (e.g., lighting conditions, illumination variations, user demographic characteristics, and input cameras).
This thesis addresses the very recent developments in face anti-spoofing methods. In particular, we propose video representation and deep learning techniques to explore spatial and temporal information between bona fide and attacked videos. Such exploration is a challenging task because 1) both real and spoofed videos contain spatiotemporal information and 2) there is the challenge of data labeling. From this perspective, we investigate feature fusion methods to compute the importance of features, because the better the features of a model, the more accurate it is. Our results suggest that hybrid deep learning provides stronger discriminative power than the deep features of a single model. In addition, we introduce a mechanism called sample learning for feature augmentation. We show that directly integrating convolutional features into a recurrent neural network can introduce the risk of interference information (e.g., mutual exclusion and redundancy), which can limit the performance of PAD.
Another key challenge is to provide powerful deep feature learning without depending on human-labeled data. This requires the research community to focus more on developing robust PAD countermeasures. To this end, we develop two countermeasures in the context of self-supervised learning, alleviating the annotation bottleneck where models obtain supervision from the data itself. Finally, the generalization capability is considered, where the proposed methods encode complex patterns from PAD videos based on global motion and data augmentation to obtain discriminative representations.
This thesis addresses the very recent developments in face anti-spoofing methods. In particular, we propose video representation and deep learning techniques to explore spatial and temporal information between bona fide and attacked videos. Such exploration is a challenging task because 1) both real and spoofed videos contain spatiotemporal information and 2) there is the challenge of data labeling. From this perspective, we investigate feature fusion methods to compute the importance of features, because the better the features of a model, the more accurate it is. Our results suggest that hybrid deep learning provides stronger discriminative power than the deep features of a single model. In addition, we introduce a mechanism called sample learning for feature augmentation. We show that directly integrating convolutional features into a recurrent neural network can introduce the risk of interference information (e.g., mutual exclusion and redundancy), which can limit the performance of PAD.
Another key challenge is to provide powerful deep feature learning without depending on human-labeled data. This requires the research community to focus more on developing robust PAD countermeasures. To this end, we develop two countermeasures in the context of self-supervised learning, alleviating the annotation bottleneck where models obtain supervision from the data itself. Finally, the generalization capability is considered, where the proposed methods encode complex patterns from PAD videos based on global motion and data augmentation to obtain discriminative representations.
Last updated: 23.1.2024