Date of Award
12-2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Automotive Engineering
Committee Chair/Advisor
Qilun Zhu
Committee Member
Laine Mears
Committee Member
Siyu Huang
Committee Member
Bing Li
Abstract
The advancement of autonomous driving technology and intelligent robotic applications has emerged as a focal point in the realm of autonomy. One of the driving forces behind this trend is the profound understanding of the environment, and at the core of this endeavor lies the three-dimensional geometric perception. This dissertation embarks on a comprehensive exploration of this domain, emphasizing the advances of depth prediction, 3D scene reconstruction, and active vision to enhance geometric perception and scene understanding capabilities in autonomous driving, embodied AI, and robotics. In the domain of depth prediction, this research addresses the challenges of accurately inferring three-dimensional depth from monocular cameras. It introduces innovative methods to improve the accuracy and robustness of depth prediction by leveraging the low-cost sparse LiDAR and handling the dynamic objects' motion and occlusion. This is critical for the perception in autonomous driving and robotics, especially vital tasks such as object detection and obstacle avoidance. Furthermore, we delve into the intricacies of 3D scene reconstruction, aiming to capture the environment's geometric structures and objects' positions. Of particular, this dissertation underscores the significance of meaningful geometric feature learning as a pivotal component for 3D reconstruction, which, by integrating multiple observed frames 2 into an improved cost volume, provides a richer and more accurate geometric encoding of the 3D scene, offering a more precise environmental reconstruction for autonomous driving and robot systems. The concept of active vision, especially for neural implicit reconstruction, is also explored. Instead of traditional geometric perception, which uses passively collected data, active vision enables the intelligent agent to explore and reconstruct the unknown scene automatically. This approach eliminated the reliance on human operation and navigation. It achieved faster exploration and perception coverage, providing a more comprehensive scene understanding and perception for embodied AI and robotics applications. In summary, this dissertation represents the cutting-edge of geometric perception in 3D computer vision. It is dedicated to enhancing scene understanding capabilities in autonomous driving and robotic applications, from depth prediction to 3D scene reconstruction, sensor fusion to geometric feature learning, and active perception. It lays a solid foundation for the future development of intelligent automotive and robotic technologies.
Recommended Citation
Feng, Ziyue, "Advancing Visual Geometric Perception: Camera-Based Depth, Reconstruction, and Active Vision" (2024). All Dissertations. 3823.
https://open.clemson.edu/all_dissertations/3823
Author ORCID Identifier
0000-0002-0037-3697