POSE ESTIMATION AND SHAPE ABSTRACTION FROM A PROBABILISTIC AND GEOMETRIC PERSPECTIVE
Johns Hopkins University
In recent years, we have witnessed great progress in how intelligent machines perceive the environment. To humans, perception is understanding the surroundings by interpreting sensing (e.g. vision, haptics, language) into semantic primitives, which are logistically understandable and mutually communicable. For machines, in their current form, perception can be defined as a process of formalizing a mathematical representation of the environment, from which higher-level tasks can be efficiently planned, modeled, and solved. Nowadays, most artificial intelligence (AI) systems rely on data-driven neural networks to establish versatile implicit representations. However, the learned representations are, to a large degree, task-specific, cumbersome, and weak in generalizability. Moreover, a learned representation is closed within itself, i.e., rarely interpretable and communicable to human beings or other AI systems. This dissertation focuses on achieving interpretable visual perception through geometric analysis and probabilistic modeling. In achieving the goal, this thesis targets two fundamental visual perception problems: pose estimation and shape abstraction. For the first problem, the author proposes a geometry-guided probabilistic model to realize an accurate and robust point cloud registration, which is the basis for understanding the pose relationship between visual inputs. Furthermore, the author proposes an unconstrained optimization method on a matrix Lie group to estimate the rigid transformation efficiently. Next, the methodology is extended to extract compact geometric primitive-based abstractions from visual inputs. This task goes beyond pose estimation and further includes the shape-level description of the visual input. Geometric primitives called superquadrics are studied and utilized as the basis for perceiving the environment. The primitive-based shape abstraction is concise and fully analytical in mathematics, which makes it a desirable standard messenger connecting perception and higher-level tasks such as segmentation, robot grasping, collision detection, motion planning, and physical simulation. Furthermore, the approach is learning-free, inherently generalizable, and interpretable, yet still possesses competitive expressiveness compared to learning-based representations.
computer vision, computational geometry, superquadrics, pose estimation, shape abstraction, probabilistic inference, Bayesian inference, robotics