On Manifold Learning for Subsequent Inference
Abstract
Manifold learning algorithms are successfully used in machine learning and statistical pattern recognition to learn a meaningful low-dimensional representation of high-dimensional data.
Manifold learning is often applied as the first step in addressing a certain ultimate inference task, such as classification and hypothesis testing, in order to obtain better performance on the inference task.
Fundamental questions arise about the utilities of manifold learning in the subsequent inference: (1) If the true low-dimensional manifold is known, is the subsequent inference in that manifold superior to that in the high-dimensional ambient space? (2) Does the subsequent inference in the learnt low-dimensional manifold recover that in the true manifold?
In this work, we explore answers to these two questions in several inference tasks. We start by considering the power of Likelihood Ratio Tests (LRTs). In multinomial models, we demonstrate that the power in the true manifold is not uniformly superior to that in the high-dimensional ambient space when there are finite samples. We then consider the expected error of classification in multinomial models and observe that classification in the true manifold fails to yield a uniformly smaller expected error than that in the ambient space.
Lastly, for network models, we consider the adjacency spectral embedded space as a manifold, where a lower-dimensional submanifold potentially exists. Under appropriate conditions, after applying Isomap in the estimated adjacency spectral embedded space to learn the submanifold, we establish that the hypothesis testing in the learnt submanifold is asymptotically equal to that in the true submanifold.