Deep Learning Methods for Analyzing Vision-Based Emotion Recognition from 3D/4D Facial Point Clouds
Thesis event information
Date and time of the thesis defence
Place of the thesis defence
Auditorium L5, Linnanmaa
Topic of the dissertation
Deep Learning Methods for Analyzing Vision-Based Emotion Recognition from 3D/4D Facial Point Clouds
Doctoral candidate
Master of Science Muzammil Behzad
Faculty and unit
University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Center for Machine Vision and Signal Analysis
Subject of study
Computer Science and Engineering
Opponent
Professor Hui Yu, University of Portsmouth, UK
Custos
Academy Professor Guoying Zhao, University of Oulu
Deep Learning Methods for Analyzing Vision-Based Emotion Recognition from 3D/4D Facial Point Clouds
Facial expressions serve as one of the most vital ways for humans to express and communicate human emotions effectively. Their role on giving emphasis or to clarify something, expressing internal feelings or intentions, and their importance in structuring critical aspects of human interactions are widely acknowledged and are, thus, significantly crucial. With the advent of recently trending state-of-the-art technologies such as, deep learning, the capability of systems for automatically recognizing and analyzing facial expressions from human faces have consequently proved to be exceptionally instrumental in understanding human behavior. This ignites the kick-start of recognition systems that can offer handsome number of applications in a wide range of areas containing, but not limited to, security, psychology, medicine and robotics.
To further improve the performance of such facial expression recognition (FER) systems, the use of 3D/4D facial point clouds has essentially expanded facial expression analysis by amplifying the strength to combat the inherent problems of processing 2D facial images, e.g., issues with out-of-plane motions, head pose variations, and illumination and lighting conditions. In this regard, the release of facial expression datasets containing 3D/4D face scans has allowed effective affect recognition by fetching facial deformation patterns both spatially as well as temporally. At the same time, such data brings along its own inevitable challenges, for instance, its complex data structure and limited size. Therefore, its analysis necessitates the use, extension and introduction of relatively promising approaches to develop successful recognition systems.
This thesis aims to develop and offer a number of deep learning methods to build robust models for analyzing emotion recognition from 3D/4D facial point clouds. Specifically, the thesis first focuses on collaborative emotion recognition where facial multi-views are used along with concentrating additionally on utilizing the facial landmarks. Secondly, it highlights the importance of sparsity-aware affect recognition and its role towards significant deep learning models. Thirdly, it presents a multi-view transformer architecture for learning spatial embeddings by exploiting correlations in the multi-view embeddings along with the formulation of a gradient-friendly loss function. Following on, a novel multi-view facial rendezvous model is discussed that learns to recognize expressions in a self-supervised fashion. Finally, the contributions of this thesis are summarized in the end, and some potential future directions of 3D/4D FER studies are discussed.
To further improve the performance of such facial expression recognition (FER) systems, the use of 3D/4D facial point clouds has essentially expanded facial expression analysis by amplifying the strength to combat the inherent problems of processing 2D facial images, e.g., issues with out-of-plane motions, head pose variations, and illumination and lighting conditions. In this regard, the release of facial expression datasets containing 3D/4D face scans has allowed effective affect recognition by fetching facial deformation patterns both spatially as well as temporally. At the same time, such data brings along its own inevitable challenges, for instance, its complex data structure and limited size. Therefore, its analysis necessitates the use, extension and introduction of relatively promising approaches to develop successful recognition systems.
This thesis aims to develop and offer a number of deep learning methods to build robust models for analyzing emotion recognition from 3D/4D facial point clouds. Specifically, the thesis first focuses on collaborative emotion recognition where facial multi-views are used along with concentrating additionally on utilizing the facial landmarks. Secondly, it highlights the importance of sparsity-aware affect recognition and its role towards significant deep learning models. Thirdly, it presents a multi-view transformer architecture for learning spatial embeddings by exploiting correlations in the multi-view embeddings along with the formulation of a gradient-friendly loss function. Following on, a novel multi-view facial rendezvous model is discussed that learns to recognize expressions in a self-supervised fashion. Finally, the contributions of this thesis are summarized in the end, and some potential future directions of 3D/4D FER studies are discussed.
Last updated: 23.1.2024