Loading…

H‐type indices with applications in chemometrics II: h‐outlyingness index

An outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two question...

Full description

Saved in:
Bibliographic Details
Published in:Journal of chemometrics 2021-11, Vol.35 (11), p.n/a
Main Authors: Yang, Qin, Xu, Lu, Tian, Guo‐Li, Wu, Ben‐Qing
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two questions, the h‐outlyingness index (HOI) is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. For applications, HOI was used for outlier diagnosis in simulated and real data sets, and the results were compared with those obtained by some robust statistical methods. Compared with the traditional methods, HOI gained similar results. For high‐dimensional data, it was wise to compute HOI based on dimension reduction methods such as principal component analysis (PCA). HOI was demonstrated to be a simple, easy‐to‐compute, robust and effective index for outlier diagnosis. Moreover, HOI is a nonparametric method that has no underlying assumptions on data distribution, which will be useful in chemometrics for multivariate outlier diagnosis. The h‐outlyingness index (HOI) is described to perform outlier detection. HOI is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. The investigation results demonstrate that HOI is a simple, nonparametric, robust, and effective index for outlier diagnosis in chemometrics.
ISSN:0886-9383
1099-128X
DOI:10.1002/cem.3375