Subspace Learning for Data Arising from a Union of Subspaces of High Relative Dimension
Johns Hopkins University
As we have witnessed the rapid growth of statistical machine learning over the past decades, the ability of processing big and corrupted data becomes increasingly important. One of the major challenges is that structured data, such as images, videos and 3D point clouds, involved in many application scenarios are high-dimensional. Conventional techniques usually approximate the high-dimensional data with low-dimensional structures by fitting the data with one or more linear subspaces. However, their theory and algorithms are restricted to the setting in which the underlying subspaces have a low relative dimension compared to the ambient space. This thesis attempts to advance the understanding of subspace learning for data arising from subspaces of high relative dimension, as well as develop efficient algorithms for handling big and corrupted data. The first major contribution of this thesis is a theoretical analysis that extends Dual Principal Component Pursuit (DPCP), a non-convex approach that learns a hyperplane in the presence of noiseless data, to learn a subspace of any dimension with noisy data. We provide geometric and probabilistic analyses to characterize how the principal angles between the global solution and the orthogonal complement of the subspace behave as a function of the noise level. Moreover, we improve the DPCP theory in multi-hyperplane case with a more interpretable geometric analysis and a new statistical analysis. The second major contribution of this thesis is the development of a linearly convergent method for non-convex optimization on the Grassmannian. We show that if the objective function satisfies a certain Riemannian regularity condition (RRC) with respect to some point in the Grassmannian, then a Projected Riemannian Sub-Gradient Method (PRSGM) converges at a linear rate to that point. In particular, we prove that the DPCP problem for learning a single subspace satisfies the RRC and PRSGM converges linearly to a neighborhood of the orthogonal complement of the subspace with error proportional to the noise level. We also extend the RRC to DPCP for a union of hyperplanes and prove the linear convergence of PRSGM to a specific hyperplane. Finally, both synthetic and real experiments demonstrate the superiority of the proposed method.
subspace learning, subspace clustering, high relative dimension, dual principal component pursuit