"Advancing Visual Geometric Perception: Camera-Based Depth, Reconstruct" by Ziyue Feng

Date of Award

12-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Automotive Engineering

Committee Chair/Advisor

Qilun Zhu

Committee Member

Laine Mears

Committee Member

Siyu Huang

Committee Member

Bing Li

Abstract

The advancement of autonomous driving technology and intelligent robotic applications has emerged as a focal point in the realm of autonomy. One of the driving forces behind this trend is the profound understanding of the environment, and at the core of this endeavor lies the three-dimensional geometric perception. This dissertation embarks on a comprehensive exploration of this domain, emphasizing the advances of depth prediction, 3D scene reconstruction, and active vision to enhance geometric perception and scene understanding capabilities in autonomous driving, embodied AI, and robotics. In the domain of depth prediction, this research addresses the challenges of accurately inferring three-dimensional depth from monocular cameras. It introduces innovative methods to improve the accuracy and robustness of depth prediction by leveraging the low-cost sparse LiDAR and handling the dynamic objects' motion and occlusion. This is critical for the perception in autonomous driving and robotics, especially vital tasks such as object detection and obstacle avoidance. Furthermore, we delve into the intricacies of 3D scene reconstruction, aiming to capture the environment's geometric structures and objects' positions. Of particular, this dissertation underscores the significance of meaningful geometric feature learning as a pivotal component for 3D reconstruction, which, by integrating multiple observed frames 2 into an improved cost volume, provides a richer and more accurate geometric encoding of the 3D scene, offering a more precise environmental reconstruction for autonomous driving and robot systems. The concept of active vision, especially for neural implicit reconstruction, is also explored. Instead of traditional geometric perception, which uses passively collected data, active vision enables the intelligent agent to explore and reconstruct the unknown scene automatically. This approach eliminated the reliance on human operation and navigation. It achieved faster exploration and perception coverage, providing a more comprehensive scene understanding and perception for embodied AI and robotics applications. In summary, this dissertation represents the cutting-edge of geometric perception in 3D computer vision. It is dedicated to enhancing scene understanding capabilities in autonomous driving and robotic applications, from depth prediction to 3D scene reconstruction, sensor fusion to geometric feature learning, and active perception. It lays a solid foundation for the future development of intelligent automotive and robotic technologies.

Author ORCID Identifier

0000-0002-0037-3697

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.