Machine learning for audio-visual kinship verification
Thesis event information
Date and time of the thesis defence
Place of the thesis defence
Auditorium IT116, Linnanmaa
Topic of the dissertation
Machine learning for audio-visual kinship verification
Doctoral candidate
Master of Science Xiaoting Wu
Faculty and unit
University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, The Center for Machine Vision and Signal Analysis (CMVS)
Subject of study
Computer Science and Engineering
Opponent
Professor Karen Eguiazarian, Tampere University
Custos
Associate Professor Miguel Bordallo López, University of Oulu
Machine learning for audio-visual kinship verification
Human faces implicitly indicate the family linkage, showing the perceived facial resemblance in people who are biologically related. Psychological studies found that humans have the ability to discriminate the parent-child pairs from unrelated pairs, just by observing facial images. Inspired by this finding, automatic facial kinship verification has emerged in the field of computer vision and pattern recognition, and many advanced computational models have been developed to assess the facial similarity between kinship pairs. Compared to human perception ability, automatic kinship verification methods can effectively and objectively capture subtle kin similarities such as shape and color. While many efforts have been devoted to improving the verification performance from human faces, multimodal exploration of kinship verification has not been properly addressed.
This thesis proposes, for the first time, the combination of human faces and voices to verify kinship, which is referred to as audio-visual kinship verification, establishing the first comprehensive audio-visual kinship datasets, which consist of multiple videos of kin-related people speaking to the camera. Extensive experiments on these newly collected datasets are conducted, detailing the comparative performance of both audio and visual modalities and their combination using novel deep-learning fusion methods. The experimental results indicate the effectiveness of the proposed methods and that audio (voice) information is complementary and useful for the kinship verification problem.
This thesis proposes, for the first time, the combination of human faces and voices to verify kinship, which is referred to as audio-visual kinship verification, establishing the first comprehensive audio-visual kinship datasets, which consist of multiple videos of kin-related people speaking to the camera. Extensive experiments on these newly collected datasets are conducted, detailing the comparative performance of both audio and visual modalities and their combination using novel deep-learning fusion methods. The experimental results indicate the effectiveness of the proposed methods and that audio (voice) information is complementary and useful for the kinship verification problem.
Last updated: 23.1.2024