From 3D sensing to dense prediction

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

Auditorium IT116, Linnanmaa

Topic of the dissertation

From 3D sensing to dense prediction

Doctoral candidate

Master of Science (Technology) Lam Huynh

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, The Center for Machine Vision and Signal Analysis (CMVS)

Subject of study

Computer Science and Engineering

Opponent

Professor Michael Felsberg, Linköping University

Custos

Professor Janne Heikkilä, University of Oulu

Add event to calendar

From 3D sensing to dense prediction

This thesis introduces novel learning-based approaches for improving 3D sensing and dense prediction. In recent years, deep neural networks (DNNs) have thrived on various vision tasks. Nonetheless, current developments indicate a compromise between accuracy, network size, and architectural engineering cost. This work proposes accurate and lightweight DNNs by exploiting prior knowledge, integrating self-attention, leveraging multi-scale 2D-3D representations fusion, and presenting efficient neural architecture search (NAS) strategies.

Recent monocular depth estimation approaches exhibit impressive results. However, these are often achieved with bulky network architectures employing up to hundreds of millions of parameters and using massive training data. This thesis introduces architectures that exploit geometric constraints and non-local self-attention mechanisms to improve performance. Moreover, the methods achieve state-of-the-art results while using at least ten times less parameters than competing approaches.

Depth completion aims to densify sparse input depth measurements. Best performing depth completion methods only work for cases with relatively high 3D point density. This work proposes a novel multi-scale framework that operates directly on both 2D and 3D feature spaces. Unlike previous approaches, the method performs well on extremely sparse and unevenly distributed 3D points. The proposed architecture is also very compact and works with an arbitrary source of the input 3D points.

Dense prediction resolves mapping problems at the pixel level, comprising many sub-tasks such as depth estimation, semantic segmentation, optical flow prediction, and image restoration. Existing methods usually use human-engineering DNNs or focus on a single sub-task. This thesis presents a novel approach utilizing NAS towards more general dense prediction problems that enable holistic scene understanding.
Last updated: 23.1.2024