Show simple item record

dc.contributor.advisorBudavari, Tamas
dc.creatorLi, Gongkai
dc.date.accessioned2019-10-30T23:56:12Z
dc.date.available2019-10-30T23:56:12Z
dc.date.created2019-08
dc.date.issued2019-05-20
dc.date.submittedAugust 2019
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/61951
dc.description.abstractThis dissertation mainly concerns with learning from dissimilarities. That is, instead of features, we only observe dissimilarities between pairs of objects for inference. This dissertation is composed with three parts regarding using classical multidimensional scaling under the noise, using Randomer Forest to denoise the datasets and how we can utilize dissimilarity for the task of graph matching. Classical multidimensional scaling (CMDS) is a widely used method in dimensionality reduction and manifold learning. The method takes in a dissimilarity matrix and outputs a low-dimensional configuration matrix based on a spectral decomposition. In this dissertation, we present three noise models and analyze the resulting configuration matrices, or embeddings. In particular, we show that under each of the three noise models the resulting embedding gives rise to a central limit theorem. We also provide compelling simulations and real data illustrations of these central limit theorems. This perturbation analysis represents a significant advancement over previous results regarding classical multidimensional scaling behavior under randomness. CMDS is a special case of a more general procedure called Manifold Learning, which is essentially required to achieve quality inferences for modern high-dimensional datasets. Many manifold learning methods have been proposed, each with their own advantages and disadvantages. In this dissertation, building off recent advances in supervised learning, we modify the leading supervised decision forest method to support unsupervised learning, and therefore also nonlinear manifold learning. The key differentiator between our Unsupervised Randomer Forest (URerf) and other manifold learning techniques is that URerF operates on low-dimensional sparse linear combinations of features, rather than either the full observed dimensionality, or one-dimensional marginals. We quantify the efficacy of URerF by computing precision-recall curves relative to the true latent manifold or class label (when it is known). Empirical results on simulated data demonstrate that URerF is robust to high-dimensional noise, where as other methods, such as Isomap and UMAP, quickly deteriorate in such settings.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherJohns Hopkins University
dc.subjectClassical Multidimensional Scaling, Central Limit Theorem, Random Forest, Dissimilarity Matrix, Noise Model
dc.titleDissimilarity Learning under Noise: ClassicalMultidimensional Scaling and Randomer Forests
dc.typeThesis
thesis.degree.disciplineApplied Mathematics & Statistics
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
dc.date.updated2019-10-30T23:56:12Z
dc.type.materialtext
thesis.degree.departmentApplied Mathematics and Statistics
dc.contributor.committeeMemberPriebe, Carey E.
dc.contributor.committeeMemberTang, Minh H.
dc.contributor.committeeMemberAthreya, Avanti
dc.publisher.countryUSA
dc.creator.orcid0000-0001-6473-3930


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record