From 3D sensing to dense prediction
Thesis event information
Date and time of the thesis defence
Place of the thesis defence
Auditorium IT116, Linnanmaa
Topic of the dissertation
From 3D sensing to dense prediction
Doctoral candidate
Master of Science (Technology) Lam Huynh
Faculty and unit
University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, The Center for Machine Vision and Signal Analysis (CMVS)
Subject of study
Computer Science and Engineering
Opponent
Professor Michael Felsberg, Linköping University
Custos
Professor Janne Heikkilä, University of Oulu
From 3D sensing to dense prediction
This thesis introduces novel learning-based approaches for improving 3D sensing and dense prediction. In recent years, deep neural networks (DNNs) have thrived on various vision tasks. Nonetheless, current developments indicate a compromise between accuracy, network size, and architectural engineering cost. This work proposes accurate and lightweight DNNs by exploiting prior knowledge, integrating self-attention, leveraging multi-scale 2D-3D representations fusion, and presenting efficient neural architecture search (NAS) strategies.
Recent monocular depth estimation approaches exhibit impressive results. However, these are often achieved with bulky network architectures employing up to hundreds of millions of parameters and using massive training data. This thesis introduces architectures that exploit geometric constraints and non-local self-attention mechanisms to improve performance. Moreover, the methods achieve state-of-the-art results while using at least ten times less parameters than competing approaches.
Depth completion aims to densify sparse input depth measurements. Best performing depth completion methods only work for cases with relatively high 3D point density. This work proposes a novel multi-scale framework that operates directly on both 2D and 3D feature spaces. Unlike previous approaches, the method performs well on extremely sparse and unevenly distributed 3D points. The proposed architecture is also very compact and works with an arbitrary source of the input 3D points.
Dense prediction resolves mapping problems at the pixel level, comprising many sub-tasks such as depth estimation, semantic segmentation, optical flow prediction, and image restoration. Existing methods usually use human-engineering DNNs or focus on a single sub-task. This thesis presents a novel approach utilizing NAS towards more general dense prediction problems that enable holistic scene understanding.
Recent monocular depth estimation approaches exhibit impressive results. However, these are often achieved with bulky network architectures employing up to hundreds of millions of parameters and using massive training data. This thesis introduces architectures that exploit geometric constraints and non-local self-attention mechanisms to improve performance. Moreover, the methods achieve state-of-the-art results while using at least ten times less parameters than competing approaches.
Depth completion aims to densify sparse input depth measurements. Best performing depth completion methods only work for cases with relatively high 3D point density. This work proposes a novel multi-scale framework that operates directly on both 2D and 3D feature spaces. Unlike previous approaches, the method performs well on extremely sparse and unevenly distributed 3D points. The proposed architecture is also very compact and works with an arbitrary source of the input 3D points.
Dense prediction resolves mapping problems at the pixel level, comprising many sub-tasks such as depth estimation, semantic segmentation, optical flow prediction, and image restoration. Existing methods usually use human-engineering DNNs or focus on a single sub-task. This thesis presents a novel approach utilizing NAS towards more general dense prediction problems that enable holistic scene understanding.
Last updated: 23.1.2024