Robust Anomaly Detection with Applications to Acoustics and Graphs
MetadataShow full item record
Our goal is to develop a robust anomaly detector that can be incorporated into pattern recognition systems that may need to learn, but will never be shunned for making egregious errors. The ability to know what we do not know is a concept often overlooked when developing classifiers to discriminate between different types of normal data in controlled experiments. We believe that an anomaly detector should be used to produce warnings in real applications when operating conditions change dramatically, especially when other classifiers only have a fixed set of bad candidates from which to choose. Our approach to distributional anomaly detection is to gather local information using features tailored to the domain, aggregate all such evidence to form a global density estimate, and then compare it to a model of normal data. A good match to a recognizable distribution is not required. By design, this process can detect the "unknown unknowns"  and properly react to the "black swan events"  that can have devastating effects on other systems. We demonstrate that our system is robust to anomalies that may not be well-defined or well-understood even if they have contaminated the training data that is assumed to be non-anomalous. In order to develop a more robust speech activity detector, we reformulate the problem to include acoustic anomaly detection and demonstrate state-of-the-art performance using simple distribution modeling techniques that can be used at incredibly high speed. We begin by demonstrating our approach when training on purely normal conversational speech and then remove all annotation from our training data and demonstrate that our techniques can robustly accommodate anomalous training data contamination. When comparing continuous distributions in higher dimensions, we develop a novel method of discarding portions of a semi-parametric model to form a robust estimate of the Kullback-Leibler divergence. Finally, we demonstrate the generality of our approach by using the divergence between distributions of vertex invariants as a graph distance metric and achieve state-of-the-art performance when detecting graph anomalies with neighborhoods of excessive or negligible connectivity.  D. Rumsfeld. (2002) Transcript: DoD news briefing - Secretary Rumsfeld and Gen. Myers.  N. N. Taleb, The Black Swan: The Impact of the Highly Improbable. Random House, 2007.