• Login
    View Item 
    •   JScholarship Home
    • Theses and Dissertations, Electronic (ETDs)
    • ETD -- Doctoral Dissertations
    • View Item
    •   JScholarship Home
    • Theses and Dissertations, Electronic (ETDs)
    • ETD -- Doctoral Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Dissimilarity Learning under Noise: ClassicalMultidimensional Scaling and Randomer Forests

    Thumbnail
    View/Open
    LI-DISSERTATION-2019.pdf (2.369Mb)
    Date
    2019-05-20
    Author
    Li, Gongkai
    0000-0001-6473-3930
    Metadata
    Show full item record
    Abstract
    This dissertation mainly concerns with learning from dissimilarities. That is, instead of features, we only observe dissimilarities between pairs of objects for inference. This dissertation is composed with three parts regarding using classical multidimensional scaling under the noise, using Randomer Forest to denoise the datasets and how we can utilize dissimilarity for the task of graph matching. Classical multidimensional scaling (CMDS) is a widely used method in dimensionality reduction and manifold learning. The method takes in a dissimilarity matrix and outputs a low-dimensional configuration matrix based on a spectral decomposition. In this dissertation, we present three noise models and analyze the resulting configuration matrices, or embeddings. In particular, we show that under each of the three noise models the resulting embedding gives rise to a central limit theorem. We also provide compelling simulations and real data illustrations of these central limit theorems. This perturbation analysis represents a significant advancement over previous results regarding classical multidimensional scaling behavior under randomness. CMDS is a special case of a more general procedure called Manifold Learning, which is essentially required to achieve quality inferences for modern high-dimensional datasets. Many manifold learning methods have been proposed, each with their own advantages and disadvantages. In this dissertation, building off recent advances in supervised learning, we modify the leading supervised decision forest method to support unsupervised learning, and therefore also nonlinear manifold learning. The key differentiator between our Unsupervised Randomer Forest (URerf) and other manifold learning techniques is that URerF operates on low-dimensional sparse linear combinations of features, rather than either the full observed dimensionality, or one-dimensional marginals. We quantify the efficacy of URerF by computing precision-recall curves relative to the true latent manifold or class label (when it is known). Empirical results on simulated data demonstrate that URerF is robust to high-dimensional noise, where as other methods, such as Isomap and UMAP, quickly deteriorate in such settings.
    URI
    http://jhir.library.jhu.edu/handle/1774.2/61951
    Collections
    • ETD -- Doctoral Dissertations

    DSpace software copyright © 2002-2016  DuraSpace
    Policies | Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of JScholarshipCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2016  DuraSpace
    Policies | Contact Us | Send Feedback
    Theme by 
    Atmire NV