STATISTICAL INFERENCE ON STRUCTURED HIGH-DIMENSIONAL MODELS USING LIKELIHOOD-BASED METHODS
Johns Hopkins University
In contemporary statistics, datasets are typically collected with high-dimensionality that originate from high-dimensional statistical models, where the dimension of the parameter space can be comparable or even significantly greater than the sample size. For these high-dimensional data, feasible statistical inference cannot be made without additional structural assumptions. One of the predominant collection of statistical inference methods for varieties of structured high-dimensional models is based on spectral methods, such as spectral clustering for stochastic block models, or penalized spectral methods, such as sparse principal component analysis. In contrast, likelihood- based methods for such non-classical statistical models are relatively under-explored. This dissertation aims to develop easy-to-implement likelihood-based inference methods for certain structured high-dimensional statistical models and the correspond- ing theoretical understanding of these methods. The first major contribution of this dissertation is on the development of a novel matrix shrinkage prior for Bayesian estimation of jointly sparse spiked covariance matrices in high dimensions. The spiked covariance matrix is reparameterized in terms of the latent factor model, where the loading matrix is assigned a novel matrix shrinkage spike-and-slab LASSO prior. We study the posterior contraction rate of the principal subspace with respect to the two-to-infinity norm loss, a novel loss function measuring the distance between subspaces that is able to capture element-wise eigenvector deviations. The second contribution of this dissertation is on the development of likelihood-based inference methods for the random dot product graph model. Both the global estimation and local estimation are considered. For the global estimation task, the minimax lower bound is established, and this minimax lower bound is achieved by a Bayesian method, referred to as the posterior spectral embedding. We also designed a handy Metropolis-Hastings sampler for convenient computation of the posterior inference. For the local estimation task, we first define the local efficiency rigorously and then propose a novel one-step procedure that takes advantage of the derivatives information of the likelihood function of the graph model. Furthermore, we establish the local efficiency of the proposed one-step estimator. In contrast, the previously widely adopted spectral-based adjacency spectral embedding method is proven to be locally inefficient.
Structured high-dimensional model, Bayesian methods, one-step estimator, structured covariance matrix, network model