Visual asymmetry in artificial neural networks
Shah, Aalap Dhiraj
MetadataShow full item record
Visual search is among the most developed topics of research in human perception but in the domain of machine learning/computer vision, research on visual search in neural networks is relatively new. Most tasks here typically focus on identification or recognition of photo-realistic objects in real world images. In contrast, a typical experiment aimed at understanding human visual search asks the human participant to search for a target among a set of distractors where both the target and the distractors are simplified stimuli like X's and O's, basic shapes and the like. The long term goal of this project is to use machine learning to explain asymmetries in human search performance. An asymmetry in search occurs when the performance (measured in terms of latency or accuracy) is different when a target and distractor set are inverted - for example, searching for a Q among many O's is easier than searching for an O among Q's. Doing this is important because accounts of known asymmetries are largely ad-hoc, specific to individual cases, and lack computational specificity. To do this, we would first have to evaluate the performance of an artificial neural network on different forms of tasks involving visual search and then juxtapose it with the performance of humans on the same task. The research presented in this thesis began with a purpose of finding a performance measure for the network. Nevertheless, we found that despite the classification prowess of the convolutional neural network (CNN), to teach a pre-trained CNN to identify a unique target among a set of distractors was not trivial. Therefore, in this thesis work we only present the many approaches we undertook to improve the CNN's performance on the aforementioned task and discuss implications of the same.