Distributional Approximation of the Classification Accuracy and Gaussian Mixture Models for Deep Learning
MetadataShow full item record
While the Deep Neural Networks (DNNs) have led significant improvement boost in many recognition tasks, their inherit structure and training algorithm has not significantly changed from traditinal Neural Networks. Sequence of linear and non-linear layers are trained together using the Back Propagation algorithm. The idea of deep learning can be compared to the idea of flow diagrams and pipelines in the engieering design. From this point of view, the deep net components are not restricted to be simple linear or non-linear functions. More complicated components can be designed for DNNs and jointly trained within the deep structure. This thesis introduces two of such components: the Gaussian Mixture Model (GMM) layer and the a novel Classification layer. Each neuron in the Gaussian Mixture Model layer outputs a GMM likelihood. This provides many new possibilities for deep learning. First, it provides a deep density estimation using a mixture of Gaussian distributions. In addition, it brings the possibility of having trainable non-linearities which have parameters trained through the coarse of training. The Classification layer preserves a simple idea of classification objective; finding parameters which minimize the probability of classification error. In the proposed two step process, the distribution of classification accuracy is first approximated and then used to compute the classification error. The classification parameters are then computed to minimize this approximated error. Using the Central Limit Theorem, it is shown that the classification layer parameters obey a closed form equation for the case of binary classifciation. For the multicalss, the parameters are trained using the back propagation algorithm. Both layers are experimented in several recognition tasks such as speech, image and diseases recognition. Competitive results are demonstrated using these components with state of the art recognition techniques in these tasks. Furthermore, significant part of this thesis discussed the theoretical aspects of the proposed components.