Statistical Inference on Multiple Graphs
Johns Hopkins University
Given multiple graphs, an important question is how to perform statistical inference on them. This question becomes more significant in the recent era with the explosion of graph data and the increasing complexity of data analysis. Successfully addressing this question will have a large impact on various scientific fields including neuroscience, social network analysis, and internet mapping. Graphs are naturally complex objects with intrinsic topological structure which imposes significant challenges to traditional statistical inference. Therefore, graph pre-processing, feature extraction, and dimension reduction are essential to obtain good subsequent inference performance. In this dissertation, I develop pre-processing, feature extraction, and dimension reduction methods for data taking the form of multiple graphs. The methods are motivated by classical statistical approaches including analysis of variance, feature screening, and principal component analysis. Some methods can be applied under both supervised and unsupervised settings; others are designed only for problems involving labels of interest. I analyze the theoretical properties of these methods jointly with subsequent inference performance under suitable random graph models. Simulations, which include graph clustering, classification, and regression are provided to demonstrate the properties of the proposed methods. I further apply the methods developed here to real data sets such as human brain networks acquired through neuroimaging techniques. The main contribution of this dissertation is the presentation of a set of methods in analyzing multiple graphs. These methods are supported with theory and numerical experiments. I further demonstrate the utility of the methods by exploring real data sets and discovering statistical patterns.