Loading…

Towards explaining anomalies: A deep Taylor decomposition of one-class models

•We enhance the prediction of anomalies (as given by a kernel one-class SVM) by explaining them in terms of input features.•The method is based on a reformulation of the one-class SVM as a neural network, the structure of which is better suited to the task of explanation.•Explanations are obtained v...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition 2020-05, Vol.101, p.107198, Article 107198
Main Authors:	Kauffmann, Jacob, Müller, Klaus-Robert, Montavon, Grégoire
Format:	Article
Language:	English
Subjects:	Deep Taylor decomposition Explainable machine learning Kernel machines Outlier detection Unsupervised learning
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•We enhance the prediction of anomalies (as given by a kernel one-class SVM) by explaining them in terms of input features.•The method is based on a reformulation of the one-class SVM as a neural network, the structure of which is better suited to the task of explanation.•Explanations are obtained via a deep Taylor decomposition, which propagates the prediction backward in the neural network towards the input features.•Application of our method to image data highlights pixel-level anomalies that can be missed by a simple visual inspection. Detecting anomalies in the data is a common machine learning task, with numerous applications in the sciences and industry. In practice, it is not always sufficient to reach high detection accuracy, one would also like to be able to understand why a given data point has been predicted to be anomalous. We propose a principled approach for one-class SVMs (OC-SVM), that draws on the novel insight that these models can be rewritten as distance/pooling neural networks. This ‘neuralization’ step lets us apply deep Taylor decomposition (DTD), a methodology that leverages the model structure in order to quickly and reliably explain decisions in terms of input features. The proposed method (called ‘OC-DTD’) is applicable to a number of common distance-based kernel functions, and it outperforms baselines such as sensitivity analysis, distance to nearest neighbor, or edge detection.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2020.107198