Multiplicatively Perturbed Least Squares for Dimension Reduction

Embargo until
Journal Title
Journal ISSN
Volume Title
Johns Hopkins University
Dimension reduction is a crucial aspect of modern data science, offering computational efficiency, insight into the structure of problems, and increased accuracy for downstream regression problems. According to a well-known result in approximation theory, the mean squared error of a non-parametric regression problem is not guaranteed to decrease faster than N^{-2p/(2p+D)}, where N is the number of samples, p a smoothness parameter of the problem, and D the dimension of the inputs. This slow rate is due to the so-called ``Curse of Dimensionality,'' in which samples in high-dimensional domains are exponentially likely to be well isolated from each other. These concerns motivate research into algorithms to determine intrinsic structure to the functions being regressed, as any reduction in D yields an exponential improvement in the lower bound of sample complexity. Even in parametric settings, large D increases computational complexity and hinders the ability to find useful parameter values. In this thesis, we discuss various existing methods of dimension reduction and introduce our own: Multiplicatively Perturbed Least Squares (MPLS). We provide a theoretical analysis of MPLS that proves it achieves the optimal convergence rate of N^{-1/2} for a broad class of functions, up to logarithmic factors. This theoretical analysis is supplemented by a series of experimental results, in which MPLS performs better or comparable to existing dimension reduction algorithms.
Dimension reduction, regression, approximation theory, curse of dimensionality