THE SPATIAL INDUCTIVE BIAS OF DEEP LEARNING
Mitchell, Benjamin R.
MetadataShow full item record
In the past few years, Deep Learning has become the method of choice for producing state-of-the-art results on machine learning problems involving images, text, and speech. The explosion of interest in these techniques has resulted in a large number of successful applications of deep learning, but relatively few studies exploring the nature of and reason for that success. This dissertation is motivated by a desire to understand and reproduce the performance characteristics of deep learning systems, particularly Convolutional Neural Networks (CNNs). One factor in the success of CNNs is that they have an inductive bias that assumes a certain type of spatial structure is present in the data. We give a formal definition of how this type of spatial structure can be characterised, along with some statistical tools for testing whether spatial structure is present in a given dataset. These tools are applied to several standard image datasets, and the results are analyzed. We demonstrate that CNNs rely heavily on the presence of such structure, and then show several ways that a similar bias can be introduced into other methods. The first is a partition-based method for training Restricted Boltzmann Machines and Deep Belief Networks, which is able to speed up convergence significantly without changing the overall representational power of the network. The second is a deep partitioned version of Principal Component Analysis, which demonstrates that a spatial bias can be useful even in a model that is non-connectionist and completely linear. The third is a variation on projective Random Forests, which shows that we can introduce a spatial bias with only minor changes to the algorithm, and no externally imposed partitioning is required. In each case, we can show that introducing a spatial bias results in improved performance on spatial data.