Show simple item record

dc.contributor.advisorMeyer, Gerard G. L.en_US
dc.creatorBorges, Nashen_US
dc.date.accessioned2014-12-23T04:36:46Z
dc.date.available2014-12-23T04:36:46Z
dc.date.created2014-05en_US
dc.date.issued2014-02-06en_US
dc.date.submittedMay 2014en_US
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/36963
dc.description.abstractOur goal is to develop a robust anomaly detector that can be incorporated into pattern recognition systems that may need to learn, but will never be shunned for making egregious errors. The ability to know what we do not know is a concept often overlooked when developing classifiers to discriminate between different types of normal data in controlled experiments. We believe that an anomaly detector should be used to produce warnings in real applications when operating conditions change dramatically, especially when other classifiers only have a fixed set of bad candidates from which to choose. Our approach to distributional anomaly detection is to gather local information using features tailored to the domain, aggregate all such evidence to form a global density estimate, and then compare it to a model of normal data. A good match to a recognizable distribution is not required. By design, this process can detect the "unknown unknowns" [1] and properly react to the "black swan events" [2] that can have devastating effects on other systems. We demonstrate that our system is robust to anomalies that may not be well-defined or well-understood even if they have contaminated the training data that is assumed to be non-anomalous. In order to develop a more robust speech activity detector, we reformulate the problem to include acoustic anomaly detection and demonstrate state-of-the-art performance using simple distribution modeling techniques that can be used at incredibly high speed. We begin by demonstrating our approach when training on purely normal conversational speech and then remove all annotation from our training data and demonstrate that our techniques can robustly accommodate anomalous training data contamination. When comparing continuous distributions in higher dimensions, we develop a novel method of discarding portions of a semi-parametric model to form a robust estimate of the Kullback-Leibler divergence. Finally, we demonstrate the generality of our approach by using the divergence between distributions of vertex invariants as a graph distance metric and achieve state-of-the-art performance when detecting graph anomalies with neighborhoods of excessive or negligible connectivity. [1] D. Rumsfeld. (2002) Transcript: DoD news briefing - Secretary Rumsfeld and Gen. Myers. [2] N. N. Taleb, The Black Swan: The Impact of the Highly Improbable. Random House, 2007.en_US
dc.format.mimetypeapplication/pdfen_US
dc.languageen
dc.publisherJohns Hopkins University
dc.subjectAnomaly Detectionen_US
dc.subjectDivergence Estimationen_US
dc.subjectSpeech Activity Detectionen_US
dc.subjectRandom Graphsen_US
dc.titleRobust Anomaly Detection with Applications to Acoustics and Graphsen_US
dc.typeThesisen_US
thesis.degree.disciplineElectrical Engineeringen_US
thesis.degree.grantorJohns Hopkins Universityen_US
thesis.degree.grantorWhiting School of Engineeringen_US
thesis.degree.levelDoctoralen_US
thesis.degree.namePh.D.en_US
dc.type.materialtexten_US
thesis.degree.departmentElectrical and Computer Engineeringen_US
dc.contributor.committeeMemberHermansky, Hyneken_US
dc.contributor.committeeMemberKhudanpur, Sanjeev P.en_US
dc.contributor.committeeMemberCoppersmith, Glen A.en_US
dc.contributor.committeeMemberGodfrey, John J.en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record