Loading…

Geometric anomaly detection in data

The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small in...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the National Academy of Sciences - PNAS 2020-08, Vol.117 (33), p.19664-19669
Main Authors: Stolz, Bernadette J., Tanner, Jared, Harrington, Heather A., Nanda, Vidit
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors.
ISSN:0027-8424
1091-6490
DOI:10.1073/PNAS.2001741117